[GSOC] Automated usage statistic collection

Sanket Agarwal sanket at sanketagarwal.com
Sun Apr 1 03:00:21 EDT 2012

On Sat, Mar 31, 2012 at 12:00 AM, Sanket Agarwal
<sanket at sanketagarwal.com> wrote:
> Hi,
> On Fri, Mar 30, 2012 at 2:38 AM, Sanket Agarwal
> <sanket at sanketagarwal.com> wrote:
>> Hi,
>> I was browsing through some of the ideas (after a long time infact) and
>> found the above interesting for multiple reasons.
>> * It has a wide scope of adoption for Pidgin, it's always very useful to see
>> which features are most used and hence has a large acceptability.
>> * The bug reporter can be used to detect common/critical bugs in early stage
>> and provide a better remedial procedure to users.
>> I was going through the tasks and had the following final product that could
>> be targeted:
>> * We can possibly have something like [1] to report various statistic
>> Adium uses something called Sparkle [3] to handle their updation engine and
>> anonymous usage statistic collector. It's a very Mac oriented solution,
>> which is used for statistics and automated software updation. it'll be nice
>> if the community could suggest a few solutions while I am searching.
> I looked over Sparkle and how it does it's usage statistic collection is as:
> * Sparkle's client side monitors to usage for a week, then it uploads
> the information to a custom appcast server
> * The appcast server is a simple php script accepting GET requests (no
> specific auth methods)
> * They have suggested the use of GET request to update the usage statistics
> Issue with their implementation
> * Currently there is no way in which the server can determine that the
> GET is posted by a actual client instead of a bot.
> We could use the above mechanism for our purposes, ofcourse this means
> we will need to delve into server side scripting which'll not exactly
> be part of pidgin's source code. We could have something like the
> following mechanism for updating statistics:
> Mechanism
> ========
> 1. Collect information locally for a fixed period of time. A week
> seems sufficient to me
> * We could start by having the following:
> a) CPU Speed
> b) Processor type
> c) Operating System
> d) RAM size
> e) Chat accounts used (their frequency over a fixed duration, say weekly)
> f) Language
> 2. Post the information on our servers
> A typical submission of a usage statistic could be something like:
> GET http://pidgin.im/usageStats.php?CPU_speed=xx&Proc_type=xx&Operating_System=xx&RAM_Size=xx&jabber=xx&...&language_en=xx&..
> We would need to define the variable parameters like Chat accounts
> used by using weighted or unweghted scheme. For example if a user is
> using multiple XMPP/Gtalk/Jabber accounts do they contribute as 1 or
> 'n' if there are 'n' such accounts through which the user is logged in
> and so on.
> 3. Design a decent Server side statistics loader
> We could use some of the fancy HTML/CSS listed at [4] (I could search
> more over that) to do the needful bit of beautifying the final
> display.
> Let me know what you think about it.
> Cheers
> --Sanket
>> [1] -- http://www.adium.im/sparkle/
>> [2] -- http://code.google.com/p/google-breakpad/wiki/LinuxStarterGuide
>> [3] -- http://sparkle.andymatuschak.org/
> [4] -- http://speckyboy.com/2009/02/04/16-usable-css-graph-and-bar-chart-tutorials-and-techniques/


I have prepared an initial draft of the application at:
http://goo.gl/qDHFL. Could you please have a look and revert back at
possible suggestions. I have tried to keep things simple for now.

Let me know.


More information about the Devel mailing list