pidgin: 48377678: This adds an "auto-detect UTF-8" option ...

Ethan Blanton elb at pidgin.im
Fri Aug 1 18:29:04 EDT 2008


Richard Laager spake unto us the following wisdom:
> On Fri, 2008-08-01 at 15:50 -0400, elb at pidgin.im wrote:
> > This adds an "auto-detect UTF-8" option to IRC which, when enabled,
> > will treat any incoming text which validates as UTF-8 as UTF-8
> > regardless of the configured account encoding. It does not affect transmission
> 
> Isn't this the same as setting your encoding to "UTF-8,$encoding"? Or is
> the second part of this the key... that it won't transmit in UTF-8?

The second part is indeed the key.  Basically, what this allows is for
someone who normally transacts in an 8-bit single-byte encoding (such
as ISO-8859-*) to properly view UTF-8 if a random newcomer to a
channel unknowingly spews forth UTF-8 (which is increasingly common).

The trouble is that UTF-8,$encoding causes transmission in UTF-8, and
$encoding,UTF-8 will never fall through in such an instance.  (All
byte sequences are valid byte sequences in many 8-bit single-byte
encodings.)

Text which is caught by this option should probably be marked in some
way (so that the recipient can say "hey, fix your encoding"), ideally.

Ethan

-- 
The laws that forbid the carrying of arms are laws [that have no remedy
for evils].  They disarm only those who are neither inclined nor
determined to commit crimes.
		-- Cesare Beccaria, "On Crimes and Punishments", 1764
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://pidgin.im/pipermail/devel/attachments/20080801/800e101f/attachment.sig>


More information about the Devel mailing list