[Pidgin] #1645: ICQ Encoding Problems

Wed Apr 16 12:12:10 EDT 2008

#1645: ICQ Encoding Problems
-----------------------+----------------------------------------------------
  Reporter:  I4ko      |       Owner:  elb  
      Type:  defect    |      Status:  new  
  Priority:  critical  |   Milestone:  2.4.2
 Component:  ICQ       |     Version:  2.0.1
Resolution:            |    Keywords:       
   Pending:  0         |  
-----------------------+----------------------------------------------------
Comment (by elb):

 Replying to [comment:67 kalin]:
 > Well, I partially agree with that, it will be more of a RFE, although I
 guess it will not be too difficult to implement if you think that the
 encoding in the prefs is what your client sends, and the encoding coming
 from the network is either autodetected or is included in the stream (I
 guess, for some protocols at least). In almost all cases I can write in
 Bulgarian or Japanese and people read properly (since pidgin is doing the
 right thing, I guess). Problem is reading what they say.

 The problem here is that ICQ, specifically, does ''not'' include the
 encoding in the stream -- and autodetection is more or less useless.  You
 can, for example, autodetect with some success if you know that the stream
 is "Cyrillic" or "Japanese", and you have a table of encodings and some
 sort of heuristic to say "oh, too many capitals, this must be KOI-8R and
 not Windows-1251" or whatever (with Japanese encodings it's even a bit
 easier, because of the shift encodings).  However, to throw characters on
 the ground and say "what is this?" without some pretty specific hints is
 not feasible.

 If you set your outgoing encoding to UTF-8, most of the official clients
 will handle that; this is why people can read what you send.  However, as
 you see, UTF-8 is not what the official clients ''send''.

 What you are seeing is a severe limitation (I would say bug) in the ICQ
 protocol itself.  Unfortunately, we can't do a whole lot about it.  In
 your case, you would need per-buddy or per-group (or similar) encoding
 preferences -- and we're not willing to compromise that much.

 > Speaking 5 languages, Cyrillic with 5 encodings and Japanese with 3 is
 not a fun soup to be drowned into. And it means that I cannot talk to a
 Shift_JIS and Win-1251 clients at the same time anyway.

 In ICQ, specifically, you shouldn't see that much diversity.  I think the
 ICQ Windows and Mac clients ''do'' use different encodings for Cyrillic
 languages (though I don't remember for sure), but you should not see more
 than one Japanese encoding in our experience.  There are also some
 official client encoding bugs in ICQ which cause for other "encodings"
 (gibberish strings, really) to show up, but I believe we have identified
 and worked around that for some time now.

 > > This is the biggest problem we have, is finding people using official
 clients to test against.
 > OK, at least I know the problem ;-)
 >
 > It seems somebody has to go deep into enemy territory, no other way...
 fine with me, I'll do it.
 >
 > I'll set up a VirtualBox with WinXP guest and put there the official
 client working with another icq#. Then if you help me, we can do some
 testing and debugging.

 That would be helpful.  I cannot guarantee how much time I can spend on it
 in the short term, but if we can get a solid inventory of what does and
 doesn't work against the official client (i.e., a matrix of the various
 features -- offline, status, invitation, normal message, etc. -- and
 whether or not they work when the encoding preference is set
 appropriately) and some data on what encodings the official client uses
 where, that would be awesome.  At this point, at least from my point of
 view, we're data-limited.

-- 
Ticket URL: <http://developer.pidgin.im/ticket/1645#comment:68>
Pidgin <http://pidgin.im>
Pidgin