[GSOC] Automated usage statistic collection

Sanket Agarwal sanket at sanketagarwal.com
Fri Mar 30 14:30:46 EDT 2012


Hi,
On Fri, Mar 30, 2012 at 2:38 AM, Sanket Agarwal
<sanket at sanketagarwal.com> wrote:
> Hi,
>
> I was browsing through some of the ideas (after a long time infact) and
> found the above interesting for multiple reasons.
>
> * It has a wide scope of adoption for Pidgin, it's always very useful to see
> which features are most used and hence has a large acceptability.
> * The bug reporter can be used to detect common/critical bugs in early stage
> and provide a better remedial procedure to users.
>
> I was going through the tasks and had the following final product that could
> be targeted:
>
> * We can possibly have something like [1] to report various statistic
>
> Adium uses something called Sparkle [3] to handle their updation engine and
> anonymous usage statistic collector. It's a very Mac oriented solution,
> which is used for statistics and automated software updation. it'll be nice
> if the community could suggest a few solutions while I am searching.

I looked over Sparkle and how it does it's usage statistic collection is as:
* Sparkle's client side monitors to usage for a week, then it uploads
the information to a custom appcast server
* The appcast server is a simple php script accepting GET requests (no
specific auth methods)
* They have suggested the use of GET request to update the usage statistics

Issue with their implementation
* Currently there is no way in which the server can determine that the
GET is posted by a actual client instead of a bot.

We could use the above mechanism for our purposes, ofcourse this means
we will need to delve into server side scripting which'll not exactly
be part of pidgin's source code. We could have something like the
following mechanism for updating statistics:

Mechanism
========
1. Collect information locally for a fixed period of time. A week
seems sufficient to me

* We could start by having the following:
a) CPU Speed
b) Processor type
c) Operating System
d) RAM size
e) Chat accounts used (their frequency over a fixed duration, say weekly)
f) Language

2. Post the information on our servers

A typical submission of a usage statistic could be something like:
GET http://pidgin.im/usageStats.php?CPU_speed=xx&Proc_type=xx&Operating_System=xx&RAM_Size=xx&jabber=xx&...&language_en=xx&..

We would need to define the variable parameters like Chat accounts
used by using weighted or unweghted scheme. For example if a user is
using multiple XMPP/Gtalk/Jabber accounts do they contribute as 1 or
'n' if there are 'n' such accounts through which the user is logged in
and so on.

3. Design a decent Server side statistics loader

We could use some of the fancy HTML/CSS listed at [4] (I could search
more over that) to do the needful bit of beautifying the final
display.

Let me know what you think about it.

Cheers
--Sanket
> [1] -- http://www.adium.im/sparkle/
> [2] -- http://code.google.com/p/google-breakpad/wiki/LinuxStarterGuide
> [3] -- http://sparkle.andymatuschak.org/
[4] -- http://speckyboy.com/2009/02/04/16-usable-css-graph-and-bar-chart-tutorials-and-techniques/




More information about the Devel mailing list