[Pidgin] #3874: ICQ group names in utf-8 read incorrectly
Pidgin
trac at pidgin.im
Sat Jan 12 12:30:21 EST 2008
#3874: ICQ group names in utf-8 read incorrectly
---------------------+------------------------------------------------------
Reporter: beret | Owner: MarkDoliner
Type: defect | Status: new
Priority: minor | Milestone:
Component: ICQ | Version: 2.2.2
Resolution: | Keywords: utf8 unicode encoding
Pending: 0 |
---------------------+------------------------------------------------------
Comment (by beret):
I reported this defect for version 2.2.2, but it is still present in
2.3.1.
I live in the Czech Republic, so I have CP1250 set as the local encoding.
I managed my contact list in the official client (ICQ6), which stores all
aliases and group names in UTF-8. Clients like ICQ5.1, QIP or SIM all read
them correctly. Although Pidgin interpretes them as CP1250 and converts to
UTF-8 (wrong), it '''does''' use UTF-8 for groups when it manages the
server-side contact list. See this group creation packet:
{{{
0000 00 a0 c5 69 98 99 00 00 e8 9f 26 ee 08 00 45 00 ...i.... ..&...E.
0010 00 ae fa 4d 40 00 40 06 a8 b0 c0 a8 01 02 cd bc ...M at .@. ........
0020 07 e5 d3 1e 14 46 e1 74 8d 20 19 4b 34 15 50 18 .....F.t . .K4.P.
0030 f9 b0 98 71 00 00 2a 02 00 cd 00 5c 00 13 00 08 ...q..*. ...\....
0040 00 00 00 00 00 ae 00 1c 53 6b 75 70 69 6e 61 20 ........ Skupina
0050 5f c3 a1 5f c3 a9 5f c5 99 5f c5 be 5f c5 a1 5f _.._.._. ._.._.._
0060 e2 80 a6 5f 00 04 00 00 00 01 00 06 00 c8 00 02 ..._.... ........
0070 00 01 00 09 34 33 30 37 39 38 38 38 38 00 04 00 ....4307 98888...
0080 01 00 00 00 13 01 31 00 0f 4b 6f 6e 74 61 6b 74 ......1. .Kontakt
0090 20 73 6b 75 70 69 6e 79 2a 02 00 ce 00 1e 00 15 skupiny *.......
00a0 00 02 00 00 00 00 00 af 00 01 00 10 0e 00 b8 c8 ........ ........
00b0 f3 09 d0 07 af 00 ba 04 28 78 ad 19 ........ (x..
}}}
The group name here is "Skupina _á_é_ř_ž_š_…_" and the packet is sent by
Pidgin.
So if Pidgin uses UTF-8 for storing groups, it should also use it for
reading their names. See the patch I attached (thanks to elb).
Unfortunately, it doesn't seem like the server indicates the encoding
groups are encoded in. Thus, it is advisable to interprete it as UTF-8 if
it validates and fall back to the local encoding if it doesn't. If I'm not
mistaken, this is already done for the contacts' aliases in
libpurple/oscar, so it's probably not a doubtful approach.
--
Ticket URL: <http://developer.pidgin.im/ticket/3874#comment:1>
Pidgin <http://pidgin.im>
Pidgin
More information about the Tracker
mailing list