security review and patches for libpurple

Sun Jul 17 18:04:42 EDT 2011

On 07/17/2011 02:54 PM, Ethan Blanton wrote:
> Dan Auerbach spake unto us the following wisdom:
>>> I also note a number of changes of the form:
>>>
>>>   buffer = malloc(some expression);
>>>   [...]
>>> -strcpy(buffer, source);
>>> +g_strlcpy(buffer, source, some expression);
>>>
>>> This is not substantially superior to the preexisting strcpy.  It may
>>> silence some automated tools, but it introduces an additional way in
>>> which the code in question can be wrong.  Specifically, the expression
>>> which is duplicated in the malloc and strlcpy can diverge with
>>> disastrous consequences.  Assuming the expression is correct, the
>>> strcpy implementation is more robust than the strlcpy implementation.
>>> If the expression is incorrect, all bets are off in both
>>> implementations without more contextual information.
>> I understand the concern about the expressions diverging. However, I
>> think the benefits of following the standard of using strlcpy outweigh
>> these concerns. There are two points that I would raise in support:
>>
>> 1. There are many other ways in which the code might change in which
>>     leaving strcpy would actually be worse than strlcpy. For example,
>>     code might be copied and pasted such that malloc was no longer
>>     invoked before strcpy. So I'd say analysis above that strcpy is
>>     more robust is true modulo certain assumptions about how the code
>>     may change with time, but that without those assumptions it's not
>>     as clear as to which implementation is safer in the long run.
> Sure.  I'm not sure, on a whole, the strlcpy is really safer, though,
> without more work (see below).
>
>> 2. More generally, there may be situations in which the semantics of
>>     an "unsafe" function such as strcpy are more desirable for a number
>>     of reasons. But I think it still is good to exercise discipline and
>>     stick to safe functions, even if they have locally undesirable
>>     effects, for the long-term security and stylistic consistency of
>>     the code base.
> Maybe.  That depends a *lot* on context.  This is a real disadvantage
> of static analysis tools -- the person reading the output has to take
> a careful look at what the code actually does.  There are other errors
> in the patches which I believe are the result of static analysis
> limitations (such as a +1 added to a malloc where it doesn't belong).
> I bring this up as a general "gotcha" to keep an eye on, not as
> specific criticism.
I agree that looking at the code carefully is essential to good 
application of static analysis tools. And though I tried to do this, 
clearly my brain was off some of the time :) But I'd still say that 
eliminating unsafe functions is a good idea, independent of the valid 
limitations of static analysis.

>> That said, there are other ways to change the code that use safe
>> functions and also avoid or minimize the concern above, and looking
>> into this is certainly worthwhile. For example, there could be a
>> wrapper that handles the malloc call and saves the mallocd size
>> "some expression" to another variable, used later in strlcpy. Even
>> the small step of setting "var = some expression", and then passing
>> in var both to malloc and to strlcpy helps to mitigate the concern,
>> and should be implemented. That we failed to implement these easy
>> mitigation strategies is an error, and in the future we will try to
>> be more diligent in making the patches as good as possible.
> In quite a few places, I have added a variable to store the results of
> the expression.  Things like:
>
> +len = strlen(str) + 1;
> -buf = malloc(strlen(str) + 1);
> -g_strlcpy(buf, str, strlen(str) + 1);
> +buf = malloc(len);
> +g_strlcpy(buf, str, len);
>
> In general, this is a much superior solution.  This was actually the
> point of my note, not that strlcpy is *inferior* to strcpy.  Bounds
> checks with on-the-spot calculated bounds must be used *very*
> carefully, whereas bounds which are tied by the compiler to other
> manipulations and allocations are much safer.
>
> The real cake is, of course, sizeof() where applicable.  :-)
Ah I see, yes that makes sense of course, and yes, sizeof is ideal.

> I have corrected these patterns in the patches where they occurred,
> and I have corrected fencepost errors where they occurred.  I've
> applied about half of the patches with or without some modification.
> Some of the patches have more serious errors or make assumptions I
> need to validate before they can be applied.  I hope to send a list of
> those patches and my specific concerns "soon".
>
> Thanks for your continued input,
> Ethan

Thanks for looking at this in detail and applying the superior solution 
where possible. I should confess that the strlcpy part of the series of 
patches we sent is probably the most error-prone, relied most heavily on 
static analysis, and given the errors you've found, probably should be 
taken more as a guide.

Dan