purple_conv_chat_cb_find's use of g_utf8_collate

Ethan Blanton elb at pidgin.im
Thu Apr 16 13:40:00 EDT 2009


Richard Laager spake unto us the following wisdom:
> On Tue, 2009-04-14 at 02:27 -0400, Zachary West wrote:
> > purple_conv_chat_cb_find() uses g_utf8_collate() to find
> > PurpleConvChatBuddys, but this is a _very, very_ slow function, taking
> > up a significant chunk of time when dealing with it.
> > 
> > Is there any reason for this function to be using g_utf8_collate()
> > instead of purple_strequal() and purple_normalize()ing the nicks? It's
> > just trying to identify which nicks are equal, does it need to take
> > the locale into consideration?
> 
> It looks like it's using g_utf8_collate() strictly for comparisons. I
> don't see how being "linguistically correct" is useful when you simply
> want to compare for equality. I haven't tested it, but from a quick read
> of the code, I don't know why purple_strequal() by itself wouldn't be
> sufficient.

Because the sequences U+0061 U+0301 and U+00E1 both represent á, but
do not string compare as identical.  In general, Unicode strings
*cannot* be reliably binary compared for equality.  g_utf8_collate()
reduces strings to something equivalent to a Unicode normal form (I'm
not actually sure that it uses a Unicode normal form, and if I recall
correctly this is specifically undefined) for a true comparison.

Ethan

-- 
The laws that forbid the carrying of arms are laws [that have no remedy
for evils].  They disarm only those who are neither inclined nor
determined to commit crimes.
		-- Cesare Beccaria, "On Crimes and Punishments", 1764
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://pidgin.im/pipermail/devel/attachments/20090416/df20a272/attachment.sig>


More information about the Devel mailing list