[Pidgin] #3874: ICQ group names in utf-8 read incorrectly

Pidgin trac at pidgin.im
Sat Jan 12 12:30:21 EST 2008


#3874: ICQ group names in utf-8 read incorrectly
---------------------+------------------------------------------------------
  Reporter:  beret   |       Owner:  MarkDoliner          
      Type:  defect  |      Status:  new                  
  Priority:  minor   |   Milestone:                       
 Component:  ICQ     |     Version:  2.2.2                
Resolution:          |    Keywords:  utf8 unicode encoding
   Pending:  0       |  
---------------------+------------------------------------------------------
Comment (by beret):

 I reported this defect for version 2.2.2, but it is still present in
 2.3.1.

 I live in the Czech Republic, so I have CP1250 set as the local encoding.
 I managed my contact list in the official client (ICQ6), which stores all
 aliases and group names in UTF-8. Clients like ICQ5.1, QIP or SIM all read
 them correctly. Although Pidgin interpretes them as CP1250 and converts to
 UTF-8 (wrong), it '''does''' use UTF-8 for groups when it manages the
 server-side contact list. See this group creation packet:

 {{{
 0000  00 a0 c5 69 98 99 00 00  e8 9f 26 ee 08 00 45 00   ...i.... ..&...E.
 0010  00 ae fa 4d 40 00 40 06  a8 b0 c0 a8 01 02 cd bc   ...M at .@. ........
 0020  07 e5 d3 1e 14 46 e1 74  8d 20 19 4b 34 15 50 18   .....F.t . .K4.P.
 0030  f9 b0 98 71 00 00 2a 02  00 cd 00 5c 00 13 00 08   ...q..*. ...\....
 0040  00 00 00 00 00 ae 00 1c  53 6b 75 70 69 6e 61 20   ........ Skupina
 0050  5f c3 a1 5f c3 a9 5f c5  99 5f c5 be 5f c5 a1 5f   _.._.._. ._.._.._
 0060  e2 80 a6 5f 00 04 00 00  00 01 00 06 00 c8 00 02   ..._.... ........
 0070  00 01 00 09 34 33 30 37  39 38 38 38 38 00 04 00   ....4307 98888...
 0080  01 00 00 00 13 01 31 00  0f 4b 6f 6e 74 61 6b 74   ......1. .Kontakt
 0090  20 73 6b 75 70 69 6e 79  2a 02 00 ce 00 1e 00 15    skupiny *.......
 00a0  00 02 00 00 00 00 00 af  00 01 00 10 0e 00 b8 c8   ........ ........
 00b0  f3 09 d0 07 af 00 ba 04  28 78 ad 19               ........ (x..
 }}}

 The group name here is "Skupina _á_é_ř_ž_š_…_" and the packet is sent by
 Pidgin.

 So if Pidgin uses UTF-8 for storing groups, it should also use it for
 reading their names. See the patch I attached (thanks to elb).

 Unfortunately, it doesn't seem like the server indicates the encoding
 groups are encoded in. Thus, it is advisable to interprete it as UTF-8 if
 it validates and fall back to the local encoding if it doesn't. If I'm not
 mistaken, this is already done for the contacts' aliases in
 libpurple/oscar, so it's probably not a doubtful approach.

-- 
Ticket URL: <http://developer.pidgin.im/ticket/3874#comment:1>
Pidgin <http://pidgin.im>
Pidgin


More information about the Tracker mailing list