Using different hash algorithm in purple_util_get_image_checksum()

Ethan Blanton elb at pidgin.im
Tue Jun 30 22:56:42 EDT 2009


Mark Doliner spake unto us the following wisdom:
> The purple_util_get_image_checksum() function in libpurple/util.c
> currently uses SHA-1 to generate a checksum for a chunk of image data.
>  SHA-1 is a cryptographic hash function, which means it's hard for
> someone to engineer a chunk of data that matches a given hash.  It
> also means it's slow.
> 
> Do we need to be using a cryptographic hash function here?  This hash
> function is one of the more expensive parts of libpurple.  I think
> it's called once for each buddy icon we receive.  Adler-32 is much
> faster when you're not concerned about security (it's maybe 8 times
> faster than SHA-1).  zlib contains an Adler-32 implementation.  I
> think GLib's g_string_hash() function is also pretty fast (but not as
> fast as Adler-32 when hashing image data).  I haven't really
> investigated what problems we would have switching hash functions... I
> think we would have to migrate or purge buddy icons from
> ~/.purple/icons/, because the icon filename is the hash.  And there
> might be other problems.
> 
> But, uh, how to people feel about this change?

A 32-bit hash brings collisions into the realm of possibility, as it
requires only about 2^16 icons in the store for the probability of
collision to rise to about 50%.

We don't necessarily need a SHA-sized hash, but we need more than
Adler32.  I suggest, however, that there's not much in between that
will have *practical* implementations which are faster than SHA.  (SHA
implementations in crypto libraries are heavily optimized, as are many
Adler32 implementations.  I am unaware of, for example, a 64-bit
checksum or hashing function with similar widespread optimization.)
It's possible that AES-128 would demonstrate benefit, or even DES with
a 56-bit key; you might want to benchmark.

(In reality, I submit that we *do* need a cryptographically secure hash
 function.  While this is not always the case for content-addressed
 storage, in our case we are storing objects which are provided by
 untrusted users from the network.  It is relatively easy to
 reverse-collide an Adler32 checksum, and possible for an individual
 to reverse-collide a DES "hashed" block.  This would mean that a
 malicious user with a timing advantage could pollute your icon store
 and prevent legitimate buddy icons from being fetched.  I sketch:

 Mallory sees that Alice has changed her buddy icon, but Bob is not
 online.  Mallory computes an icon which collides with Alice's in the
 Pidgin buddy icon CAS.  When Bob comes online, Mallory rapidly sends
 Bob this new buddy icon information as his own buddy icon, storing it
 in Bob's CAS.  When Bob engages in conversation with Alice, Mallory's
 buddy icon is now shown for Alice.

 The saving grace here is ... nobody cares that much about buddy
 icons.  Particularly now that Hulu is around.)

Ethan

-- 
The laws that forbid the carrying of arms are laws [that have no remedy
for evils].  They disarm only those who are neither inclined nor
determined to commit crimes.
		-- Cesare Beccaria, "On Crimes and Punishments", 1764
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 481 bytes
Desc: Digital signature
URL: <http://pidgin.im/pipermail/devel/attachments/20090630/b39d21ed/attachment.sig>


More information about the Devel mailing list