GSoc idea: Echo cancellation for voice calls.

Stefan Kriwanek dev at
Tue Apr 3 19:19:40 EDT 2012


I believe an apology is in order after not being active for two weeks
now. My plans got disrupted both by me becoming ill and by a nasty bug
on my would-be development machine's Ubuntu 12.04 breaking voice/video
totally (
Sorry for being silent over this time.

I still do want to pursue my proposed idea. In fact I now believe the
mentioned bug gives me the opportunity to round out the project. Or put
another way, to do something actually useful while familiarizing with
the relevant details of libpurple. So, in my mind, the rough steps of
the project would look as follows (details below):

- Extend libpurple to support both Farstream2 and Farsight
- Support for echo cancellation
- Various minor voice/video related usage improvements

Extend libpurple to support both Farstream2 and Farsight

libpurple uses the Farsight2 framework as its backend to encode and
decode media streams and to send them over the network. Farsight was
very recently (2012-02-20) renamed to Farstream and its developers used
the renaming to somewhat adjust the API, breaking
backwards-compatibility. This is what crashed VV in Ubuntu 12.04.

Thus, libpurple will need to be adjusted to work with the new Farstream
API. Of course, distributions will eventually care about that and some
are already doing so (follow the link above) and this means we can build
upon that work. However, IMHO libpurple source needs to support both
APIs alternatively at least until most contemporary distributions are
out of date, for choosing one of them will either cut users of
contemporary distributions off of libpurple updates or cause unnecessary
duplicate (and incompatible and so on) work at distributions if
libpurple is not updated timely.

On the implementation part I believe the best way to proceed is to first
get libpurple to work with the new Farstream API. Then to teach how to detect which library is present and use the result
to insert #if #else blocks where necessary. From what I got to know so
far the API changes are way to small to justify the "cleaner"
duplication approach.

Support for echo cancellation

I already wrote a lengthy rationale for the idea on March 20th, so I'll
only try to reply to Mark's and Ethan's comments here.

> One concern is that this might be re-implementing the wheel.
> The blog post you linked to mentions that they implemented echo
> cancellation as a set of patches to Pulse Audio.  If those patches
> were accepted then it would be easy for Pidgin to take advantage of
> it.  But maybe their patches weren't accepted?  And of course, not
> everyone uses Pulse Audio (I don't).

Doing some further research I now know of two "ready-to-use" means of
echo cancellation:
- The Pulse Audio patches did get accepted and it should be easy to make
use of them.
- The Speex project has a DSP (digital signal processing) library
independent of its Speex codec since 2007-12-11 (so most distributions
should ship it today). I suppose this one will take more time to
integrate into Pidgin. It introduces a new dependency which in my
opionion should be optional even inside the optional media support. More
importantly, the library needs to process the audio streams timely and
synchronised which seems to be a delicate business regarding the facts
that it doesn't sit as near to the hardware as Pulse Audio, might prefer
certain bitrates.
I propose Pidgin should use the Pulse Audio approach if possible and
fall back to the Speex approach if necessary.

> I would also like to make sure this feature is surfaced to the user in
> appropriate ways.  Ideally it would be automatic and the user wouldn't
> know it was happening and wouldn't need the ability to turn it off or
> on.  But if it doesn't work flawlessly, maybe we would need to give
> the user an option to disable it?

I fully agree! Let me add:
I'd don't believe the ability to turn it off will be necessary. It could
in fact confuse the user, because in a situation where lack of AEC still
allows a conversation enabling AEC improves the quality for the _remote_
participant to a much greater extend that for the local one. Users could
get confused there.
I do, however, suggest the ability to turn it off completely at build
time, because on sufficiently low-end devices echo cancellation might
eat up a noticeable portion of cpu cache and time. I can only guess from
my very limited academic experience with high performance computing,
that low-end Android phones might be such devices.

> Or if it only works when using
> Pulse Audio, maybe we need to provide messaging to the user that they
> might get better audio quality if they use Pulse Audio?
Yes. As I see it, there's a good chance with Pulse Audio present
selecting ALSA will get you the ALSA emulation of Pulse Audio, which is
yet another reason to prefer PA if present.

Regarding my quoting of "ready to use" above: As I get to learn more and
more of the details of AEC and its implementation I am led that using it
might be pretty involved: One fact I didn't mention so far is that AEC
for voice talks needs double speak detection not to cause havoc in cases
of double speech. Getting it it all to work together might probably fill
the 10 weeks..

Just in case I'm wrong and there'll be time left at the end of the
project, I'd like to invite you to brainstorm about

Various minor voice/video related usage improvements

Ethan suggested some:
> However, I think there are probably a number
> of other, related features you could come up with to round out the
> project; things like equalization, monitor volume, etc.

Ethan, I'm not sure what you mean by "monitor volume". There'a already a
progress bar abused to show the current volume level (both local and
remote speech).

Regarding equalization, I'm not sure that's Pidgin's job, but would be
better located somewhere in the 3 or 4 layers down to the hardware.
Besides, it's unfortunately impossible to do automatically--one could
only equalize the complete speakers-microphone chain. Then again, echo
would get in the way.. Please correct me if I'm wrong or you had
something else in mind!

I'd find it exciting to implement an automatic noise level detection.
Currently the user can set a cut-off volume manually, whose optimal
value will depend on the microphone gain set.. On the other hand, that's
another "open" task requiring more research and I get the feeling my two
"big" goals are open enough as they are.

Also, I noticed a bug (filed at Ubuntu, where
opening a video session is not possible if only one side has got a
webcam. No idea if it's still present and I didn't have the time to test
it, but it seems useful and quick to fix.

Any further ideas?

Stefan Kriwanek

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <>

More information about the Devel mailing list