[Pidgin] #1645: Some cyrillic problems
Pidgin
trac at pidgin.im
Wed Jun 13 12:01:08 EDT 2007
#1645: Some cyrillic problems
---------------------------+------------------------------------------------
Reporter: I4ko | Owner:
Type: defect | Status: new
Priority: minor | Milestone:
Component: pidgin (gtk) | Version: 2.0.1
Resolution: | Keywords:
Pending: 0 |
---------------------------+------------------------------------------------
Comment (by elb):
What is in those packets is *not* UTF-8, that is what I'm telling you; the
text inside the away message block is in a two-byte encoding (look at it
-- the HTML bytes (which are ASCII) are all preceded by an 0x0 byte; the
character Р is encoded as 0x04, 0x20, which is U+0420 in UCS-2BE, which
the encoding tag claims is the encoding of that message). There should be
no UTF-8 involved anywhere in this process; I have been overlooking your
mention of it, assuming that you simply believe that all Unicode is UTF-8.
The fact that you recognize that string as a Windows-1251 rendering of
UTF-8 leads me to believe that perhaps the remote client converted the
away message from Windows-1251 to UTF-8, and then converted the UTF-8
string from Windows-1251 to UCS-2BE; this is, of course, nonsensical, but
it would produce the string which is _actually stored_ in that capture
file. (Please do not "disagree" with me about what is in the packet,
please look at it yourself and confirm that it is, in fact, what I am
claiming it is. You can do this by converting the text in the away
message to Windows-1251, and then displaying it as UTF-8 -- you will see
the correct away message.)
There is no way for us to predict this complete brokenness. If they are
truly using the official client, then it is broken and mishandling
encodings. I do not know how we can detect this situation and correct for
it.
--
Ticket URL: <http://developer.pidgin.im/ticket/1645#comment:5>
Pidgin <http://pidgin.im>
Pidgin
More information about the Tracker
mailing list