Optimize jabber_id_new()

Peter Saint-Andre stpeter at stpeter.im
Tue Jun 30 12:38:04 EDT 2009

Hash: SHA1

On 6/30/09 10:33 AM, Ethan Blanton wrote:
> Mark Doliner spake unto us the following wisdom:
>> The jabber_id_new() function in libpurple/protocols/jabber/jutil.c is
>> pretty expensive.  It creates a JabberID struct given the string
>> version of a Jabber username (i.e. it splits
>> "mark.doliner at gmail.com/Home" into "mark.doliner" "gmail.com" and
>> "Home").  It also lowercases the node and domain, does utf8
>> normalization, and does stringprep validation to ensure the JID is
>> comprised only of characters allowed by the XMPP RFC.
>> We've optimized this function at Meebo.  In our testing we found that
>> the vast majority of JIDs are made of these characters: a-z A-Z 0-9 @
>> / { | } ~ . [ \ ] ^ _ ;  And so we do a quick first pass over the
>> given string.  If the string contains only these characters than we
>> skip g_utf8_normalize() and skip stringprep and only lowercase the
>> node and domain.  Otherwise we do everything.
>> How do people feel about me checking this change into the jabber code
>> in libpurple?  Meebo probably has a larger percentage of
>> English-speaking users than Pidgin, so maybe our results are unfairly
>> biased.  Does anyone know how common non-ASCII JIDs are?
> This seems very reasonable to me.  If the "expensive" checks are
> expensive enough that Meebo cares about them, we should avoid them
> when they are unnecessary.  If it turns out that the short-circuit
> checks are too expensive when jids *are* non-ASCII (which, looking at
> the source, I doubt), we can revisit this again.  I share your and
> Daniel's intuition that most jids will be ASCII anyway.

This is my experience as well (e.g., at the jabber.org service). Sounds
like a smart optimization to me.


- --
Peter Saint-Andre

Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the Devel mailing list