[Pidgin] #1645: ICQ Encoding Problems
Pidgin
trac at pidgin.im
Wed Apr 16 12:12:10 EDT 2008
#1645: ICQ Encoding Problems
-----------------------+----------------------------------------------------
Reporter: I4ko | Owner: elb
Type: defect | Status: new
Priority: critical | Milestone: 2.4.2
Component: ICQ | Version: 2.0.1
Resolution: | Keywords:
Pending: 0 |
-----------------------+----------------------------------------------------
Comment (by elb):
Replying to [comment:67 kalin]:
> Well, I partially agree with that, it will be more of a RFE, although I
guess it will not be too difficult to implement if you think that the
encoding in the prefs is what your client sends, and the encoding coming
from the network is either autodetected or is included in the stream (I
guess, for some protocols at least). In almost all cases I can write in
Bulgarian or Japanese and people read properly (since pidgin is doing the
right thing, I guess). Problem is reading what they say.
The problem here is that ICQ, specifically, does ''not'' include the
encoding in the stream -- and autodetection is more or less useless. You
can, for example, autodetect with some success if you know that the stream
is "Cyrillic" or "Japanese", and you have a table of encodings and some
sort of heuristic to say "oh, too many capitals, this must be KOI-8R and
not Windows-1251" or whatever (with Japanese encodings it's even a bit
easier, because of the shift encodings). However, to throw characters on
the ground and say "what is this?" without some pretty specific hints is
not feasible.
If you set your outgoing encoding to UTF-8, most of the official clients
will handle that; this is why people can read what you send. However, as
you see, UTF-8 is not what the official clients ''send''.
What you are seeing is a severe limitation (I would say bug) in the ICQ
protocol itself. Unfortunately, we can't do a whole lot about it. In
your case, you would need per-buddy or per-group (or similar) encoding
preferences -- and we're not willing to compromise that much.
> Speaking 5 languages, Cyrillic with 5 encodings and Japanese with 3 is
not a fun soup to be drowned into. And it means that I cannot talk to a
Shift_JIS and Win-1251 clients at the same time anyway.
In ICQ, specifically, you shouldn't see that much diversity. I think the
ICQ Windows and Mac clients ''do'' use different encodings for Cyrillic
languages (though I don't remember for sure), but you should not see more
than one Japanese encoding in our experience. There are also some
official client encoding bugs in ICQ which cause for other "encodings"
(gibberish strings, really) to show up, but I believe we have identified
and worked around that for some time now.
> > This is the biggest problem we have, is finding people using official
clients to test against.
> OK, at least I know the problem ;-)
>
> It seems somebody has to go deep into enemy territory, no other way...
fine with me, I'll do it.
>
> I'll set up a VirtualBox with WinXP guest and put there the official
client working with another icq#. Then if you help me, we can do some
testing and debugging.
That would be helpful. I cannot guarantee how much time I can spend on it
in the short term, but if we can get a solid inventory of what does and
doesn't work against the official client (i.e., a matrix of the various
features -- offline, status, invitation, normal message, etc. -- and
whether or not they work when the encoding preference is set
appropriately) and some data on what encodings the official client uses
where, that would be awesome. At this point, at least from my point of
view, we're data-limited.
--
Ticket URL: <http://developer.pidgin.im/ticket/1645#comment:68>
Pidgin <http://pidgin.im>
Pidgin
More information about the Tracker
mailing list