Optimize jabber_id_new()

Tue Jun 30 04:31:15 EDT 2009

The jabber_id_new() function in libpurple/protocols/jabber/jutil.c is
pretty expensive.  It creates a JabberID struct given the string
version of a Jabber username (i.e. it splits
"mark.doliner at gmail.com/Home" into "mark.doliner" "gmail.com" and
"Home").  It also lowercases the node and domain, does utf8
normalization, and does stringprep validation to ensure the JID is
comprised only of characters allowed by the XMPP RFC.

We've optimized this function at Meebo.  In our testing we found that
the vast majority of JIDs are made of these characters: a-z A-Z 0-9 @
/ { | } ~ . [ \ ] ^ _ ;  And so we do a quick first pass over the
given string.  If the string contains only these characters than we
skip g_utf8_normalize() and skip stringprep and only lowercase the
node and domain.  Otherwise we do everything.

How do people feel about me checking this change into the jabber code
in libpurple?  Meebo probably has a larger percentage of
English-speaking users than Pidgin, so maybe our results are unfairly
biased.  Does anyone know how common non-ASCII JIDs are?

I suspect that even for the case where the jid contains non-ASCII
characters our optimized version won't be very much slower, and might
even be faster (it makes one pass over the string to determine the
location of @ and / instead of calling g_utf8_strchr() twice (but
that's easy to fix on its own)).

In other words: How does everyone feel about the attached patch?

-Mark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: optimize_jabber_id_new.diff
Type: text/x-patch
Size: 3001 bytes
Desc: not available
URL: <http://pidgin.im/pipermail/devel/attachments/20090630/bafce4a0/attachment-0002.bin>