Another g_markup_escape_text Vulnerability

Sat May 5 18:36:40 EDT 2012

On Fri, May 4, 2012 at 3:05 PM, Elliott Sales de Andrade
<qulogic at pidgin.im> wrote:
> On Fri, May 4, 2012 at 3:53 AM, Mark Doliner <mark at kingant.net> wrote:
>>
>> Hey Elliott, from looking at your patch it seems like there might
>> still be a few places where we don't validate the string as UTF-8.
>> For example:
>> - If no content-type is provided
>
> I'm actually not sure why that conversion is there, now that I look again.
> If there is no content-type, then `msn_cmdproc_process_msg` (which is called
> on the message immediately after, and leads to the crash) should ignore the
> message entirely.

Good point.  I think you're right.  So it looks like we can ignore this case.

>> - If content-type is not text/plain
>
> If it's not text/plain, then it may not necessarily be UTF-8 anyway. It
> could, for example, be P2P data. They definitely shouldn't be displayed, but
> there might be some other type of parsing going on, but I'd have to check
> the other content types to be sure.

Oh, I see what's happening.  msn_plain_msg() is only called when
content-type is text/plain, and so you're only ensuring UTF-8 when
content-type is text/plain.  And yeah, attempting to convert to UTF-8
when content-type is not text/plain would break binary data.  So we're
good here, too.

>> - If msg->charset is NULL and g_convert(from ISO-8859-1 to UTF-8) fails
>> - If msg->charset is set to something other than UTF-8 and
>> g_convert(msg->charset to UTF-8) fails and g_convert(ISO-8859-1 to
>> UTF-8) fails
>
> Is it even possible for ISO-8859-1->UTF-8 conversion to fail? I thought all
> bytes are valid ISO-8859-1 and there are no multi-byte characters. I tried
> converting a buffer of 255-0 which did not induce an error.

Oh, good point, I'd forgotten about that.  I went ahead and added a
check for it, just to be polite to people with broken systems.

I've attached an updated patch.  Changes from yours:
- re-ordered the if-else statements a little.  I think I removed one
level of depth from one place.
- Don't bother trying to convert anything if msg->body if NULL.  I'm
not sure this is needed... but if seems like it might be.
- I noticed that msn_oim_report_to_user() calls
msn_message_parse_payload() and it looks like it assumes msg->body is
UTF-8 without actually validating it or checking that content-type is
text/plain.  So I added a call to g_utf8_validate().

One remaining question I have... are we sure the headers will be valid
UTF-8?  I think we assume they're UTF-8 in a lot of places.

I'll be out of town for three weeks starting Tuesday, so I'm hoping to
release this tomorrow.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pidgin-msn-utf8-validation-v2.diff
Type: application/octet-stream
Size: 3195 bytes
Desc: not available
URL: <http://pidgin.im/cgi-bin/mailman/private/security/attachments/20120505/512d4f81/attachment.obj>