Meego Wiki
Views

Metrics/Dashboard

From MeeGo wiki
< Metrics
Revision as of 18:02, 19 January 2011 by Dneary (Talk | contribs)
Jump to: navigation, search

Community Metrics Dashboard

The goal is to provide a web page summarising metrics about various aspects of the MeeGo project. The data should update regularly - depending on the metric, that could be real time or updated automatically on a regular basis.

The dashboard will track the following community resources, ideally:

  • Drupal members
  • Mailing lists (members, posts, threads)
  • gitorious (commits, employer details for committers) - should use Jon Corbet's scripts like are used in the LF yearly kernel data.
  • Wiki (edits, new pages)
  • Forums (members, posts)
  • IRC (total comments, people on channel)
  • Transifex (Languages, translators, strings translated)
  • Community OBS (uploads, users)
  • SDK downloads (potentially extrapolated from meego.com)

The data should also be available for custom reports for usage and analysis in the monthly MeeGo Metrics report published by User:DawnFoster

To fulfill these goals, the dashboard will gather data from the various resource into a centralised database, using some sort of Business Intelligence platform including ETL for data acquisition and storage, and a reporting service for generating reports and dashboards.. A web page will provide a view into this database with predefined reports.

Candidate reporting solutions:

The following are essentially ETL engines, and do not provide reporting or dashboard functionality:

MuleSoft is an open source ESB, but does not seem adapted to our needs. The field is thus narrowed to Pentaho and JasperReports.

For each community resource, we need to figure out how to get the data into a usable form, and come up with appropriate queries for metrics reports, and finally present the results on a webpage.

Business intelligence engines

The area of Business Intelligence is littered with acronyms. Here's a quick overview of the main ones, and how they all fit together.

BI
Business Intelligence - general name for any middleware which allows you to query business processes (sales, inventory, etc) and get data overviews from it
ETL
Extract, Transform, and Load - the process if extracting data from a data source (database, screen scraping, text file parsing, whatever), transforming it to a well understood format, and loading it in your BI engine database or data warehouse. Good ETL solutions provide a nice way for you to connect another database and have new data sucked in at regular intervals, define views into the source data store which you can then query within your BI engine, etc. Pentaho's ETL, Kettle, and JasperETL, used by JasperReports, both provide (kind of) straightforward ways to hook into a MySQL database.
ESB
Enterprise Service Bus - a middleware bus providing a unique interface to applications on the front-end and data stores on the back end. Often used to link up many front-end applications (eg. library, student registration, employee payroll, syllabus management, accounting, supply-chain, student lodgement programmes, etc in a university). Not really useful for us, as far as I can tell.
EAI
Enterprise Application Integration - using software to integrate different applications together. As far as I can tell, this is a meaningless catch-all phrase for anything from kludges to architected business intelligence solutions.
DW
Data Warehouse. Basically the same thing as a database, as far as I can tell, but bigger and more impressive sounding.
OLAP
On-Line Analytical Processing. Commonly used acronym for extracting data via multi-dimensional queries. Databases can be configured to provide the results of this kind of query. As far as I can tell this is mostly a buzzword - an "OLAP database" like Mondrian is basically the same thing as a database. "speed-of-thought" response times indeed.
Business reporting
An application which allows a graphical view of a database, and allows you to construct queries interactively, often using drag & drop. The results of these queries can then be plugged into graphing software for presentation in a dashboard.
Dashboard
Organised presentation of information in a web-page or other similar format allowing an at-a-glance overview of the situation for the data being measured.

So, in short, the community dashboard project will likely use an ETL to plug data into an OLAP server, and then use a business reporting engine to query that data and present it in a dashboard.

Comparison of candidate ETL/reporting

Modules available:

Software License ETL OLAP database BI server Reporting Dashboard module
Pentaho EPL Kettle Mondrian Pentaho BI Platform Pentaho Reporting Community Dashboard Framework
Jaspersoft AGPL v3 JasperETL (Talend Open Studio) JasperOLAP JasperReports Server iReports editor No (commercial only)

Support for community applications:

ETL import Pentaho reports Jaspersoft reports
Drupal MySQL import Query Export for Jasper
Bugzilla MySQL import Bugzilla analytics Bugzilla reports with OLAP
MediaWiki MySQL import
Transifex ?
Mailman MySQL import via mlstats
IRC ?
Forum ?
Web analytics ?
git Via gitdm?
Personal tools