Meego Wiki
Views

Metrics/Gathering data

From MeeGo wiki
(Difference between revisions)
Jump to: navigation, search
(Add SSS import & parsing to metrics "Gathering data" page. Next up: mediawiki)
(IRC - SuperSeriousStats)
Line 89: Line 89:
=== Parsing the logs ===
=== Parsing the logs ===
-
To run sss, you will need to modify a number of fields in sss.conf. Notably, I had to change the filename format to match the log file names, and set log format to supybot. I recommend creating a separate config file for each channel you want to parse.
+
To run sss, you will need to modify a number of fields in sss.conf. Notably, I had to change the filename format to "*.Y-m-d.\lo\g" to match the log file names, and set log format to supybot. I recommend creating a separate config file for each channel you want to parse. Here is the appropriate sss.conf for #meego logs:
 +
<pre>
 +
#
 +
#    This file contains all of sss' settings along with their defaults.
 +
#    All values must be placed between double quotes, even empty ones.
 +
#
 +
#################################  Required  ###################################
-
sss has a mechanism to prevent it from re-importing log files for the same channel, but you can store summary data from several channels in one database.
+
channel = "#meego" # Name of the IRC channel.
 +
timezone = "UTC" # Timezone the logs are in. Used for time offset
 +
# calculations and conversions.
 +
# See http://php.net/manual/en/timezones.php
 +
db_host = "host" # IP address or FQDN of the MySQL server.
 +
db_port = "3306" # Port the MySQL server is listening on.
 +
db_user = "db_user" # MySQL user.
 +
db_pass = "db_password" # MySQL password.
 +
db_name = "db_name" # Name of the MySQL database used for sss.
 +
parser = "parser_supybot" # The parser to use depending on logfile format.
 +
# e.g. "parser_irssi" or "parser_eggdrop"
 +
 
 +
# This string contains the format of the date within a logfile filename.
 +
# Examples:
 +
#  filename: #chatroom.20030131 dateformat: *.Ymd
 +
#  filename: #chatroom.20030131 dateformat: \#c\h\atroo\m.Ymd
 +
#  filename: chatroom.log-31012003 dateformat: *.*-dmY
 +
#  filename: chatroom.log-31012003.gz dateformat: *.*-dmY.\g\z
 +
#  filename: chatroom.log-31012003.gz dateformat: *.*-dmY.*
 +
# See http://php.net/date_create_from_format for more specific syntax options.
 +
logfile_dateformat = "*.Y-m-d.\lo\g"
 +
</pre>
 +
 
 +
sss has a mechanism to prevent it from re-importing log files for the same channel, by storing dates for which it has already parsed files in the table "parse_history", but you can store summary data from several channels in one database.
To do the initial import, just run
To do the initial import, just run
-
  php sss.php -i /path/to/logs/
+
  php sss.php -c meego.conf -i ${logdir} -o report.html
-
And to do the nightly import, add this after the wget line above:
+
And to do the nightly import, put the following in a shell script, and call it from cron:
-
php sss.php -c channel.conf -i ${logdir}/%23meego.${logdate}.log
+
<pre>
 +
  logdate=`date +%Y-%m-%d -d yesterday`
 +
  logdir=/path/to/logfiles
 +
  sssdir=/path/to/sss-4.0
 +
 
 +
  /usr/bin/wget -P $logdir http://mg.pov.lt/meego-irclog/%23meego.${logdate}.log;
 +
  /usr/bin/php ${sssdir}/sss.php -c meego.conf -i ${logdir}/\#meego.${logdate}.log
 +
</pre>
=== Queries ===
=== Queries ===
-
A number of useful sss queries are already listed in [[Metrics/Dashboard#IRC | the Dashboard page]]
+
The sss report is built from a number of queries. A number of other useful sss queries are already listed in [[Metrics/Dashboard#IRC | the Dashboard page]]
 +
 
 +
 
 +
'''Activity by month'''
 +
<pre>
 +
select date_format(`date`, '%Y-%m') as `date`, sum(`l_total`) as `l_total`, sum(`l_night`) as `l_night`, sum(`l_morning`) as `l_morning`, sum(`l_afternoon`) as `l_afternoon`, sum(`l_evening`) as `l_evening` from `channel` group by year(`date`), month(`date`);
 +
</pre>
 +
 
 +
'''Activity by day over last 30 days'''
 +
 
 +
<pre>
 +
select `date`, `l_total`, `l_night`, `l_morning`, `l_afternoon`, `l_evening` from `channel` where `date` > DATE_SUB(CURDATE(),INTERVAL 30 DAY);
 +
</pre>

Revision as of 16:07, 4 July 2011

For each of the services we gather data for, here's a guide to getting that data:

Contents

Mailing lists - MLStats

Mailman mailing lists can be downloaded, parsed and stored in a MySQL database using MLStats.

The general idea is to point mlstats at the list archive page, and let it do the work of figuring out what to download.

We are carrying a small local patch to mlstats to ensure that it re-downloads the current month's archives and reparses them. The patch has been submitted upstream.

Once you have downloaded and unpacked MLStats 0.4, patched it with the patch above, and installed it with
setup.py install --prefix=/install/path
you will need to "prime the pump", and download and import the archives to all of the Maemo mailing lists.

The format of the mlstats command line is:

/path/to/mlstats --db-user=<username> --db-password=<password> http://lists.meego.com/pipermail/meego-announce/
The command line option
--no-report
suppresses the creation of a report after the import, useful for a cron job.

You can list the archives for a number of mailing lists together on the command line. The list of MeeGo mailing lists is here.

for list in meego-adaptation-intel-automotive meego-announce meego-architecture \
            meego-commits meego-community meego-dev meego-distribution-tools \
            meego-events meego-handset meego-il10n meego-inputmethods meego-it \
            meego-ivi meego-kernel meego-packaging meego-pm meego-porting \
            meego-python meego-qa meego-releases meego-sdk meego-security \
            meego-security-discussion meego-touch-dev meego-tv;
do
  /path/to/mlstats --no-report --db-user=<username> --db-password=<password> http://lists.meego.com/pipermail/${list} >> /tmp/output.txt
done;

This should be run every night through cron.

Mailing list graph over time

The SQL query (yes, a Big Hairy Beast) which can be used to create a report graphing the messages posted to each list, month by month, is this:

SELECT
     `messages`.`mailing_list_url` AS list,
     year(first_date) AS y,
     monthname(first_date) AS mon,
     month(first_date) AS m,
     date_format(first_date, '%M %Y') as monthstr,
     date_format(first_date,'%Y%m') as monthnum, 
     count(*) AS c
FROM
     `messages`
WHERE
     year(first_date)		 > 1979 and 
     mailing_list_url not like '%meego-commits%' and first_date<'2011-05-01'
GROUP BY
     `messages`.`mailing_list_url`,
     y,m
ORDER BY
     monthnum ASC,
     list ASC,
     c ASC

We use $monthnum to order the results over time, and $monthstr as a more meaningful X axis label. We eliminate all emails with bad Date headers and filter meego-commits out of our analysis, and group the messages by mailing list, to give the following graph as the final result (how to create a report will follow later):

Mlgraph.png

IRC - SuperSeriousStats

SuperSeriousStats is a super serious IRC log analyser, developed by Jos de Ruiter. It is written in PHP.

To use SuperSeriousStats, you need to install the php-cli command line interface. Once you have this, and the other dependencies for sss listed here, you will be able to run sss.

Before running it for the first time, initialise the database. You will need to download all of the IRC logs you wish to import first. Logs are expected in the Supybot format (example).

Getting the IRC logs

For the initial download, the following script should help get all the logs:

for y in 2010 2011; do
 for m in `seq -f '%02g' 1 12`; do
  for d in `seq -f '%02g' 1 31`; do
   wget http://mg.pov.lt/meego-irclog/%23meego.$y-$m-$d.log;
  done;
 done;
done;

Thereafter, you can add the following line in your cron job to get the previous day's logs:

   logdate= `date +%Y-%m-%d -d yesterday`
   wget -P $logdir http://mg.pov.lt/meego-irclog/%23meego.${logdate}.log;

Parsing the logs

To run sss, you will need to modify a number of fields in sss.conf. Notably, I had to change the filename format to "*.Y-m-d.\lo\g" to match the log file names, and set log format to supybot. I recommend creating a separate config file for each channel you want to parse. Here is the appropriate sss.conf for #meego logs:

#
#    This file contains all of sss' settings along with their defaults.
#    All values must be placed between double quotes, even empty ones.
#
#################################  Required  ###################################

channel = "#meego"		# Name of the IRC channel.
timezone = "UTC"	# Timezone the logs are in. Used for time offset
			# calculations and conversions.
			# See http://php.net/manual/en/timezones.php
db_host = "host"		# IP address or FQDN of the MySQL server.
db_port = "3306"	# Port the MySQL server is listening on.
db_user = "db_user"	# MySQL user.
db_pass = "db_password"		# MySQL password.
db_name = "db_name"		# Name of the MySQL database used for sss.
parser = "parser_supybot"	# The parser to use depending on logfile format.
			# e.g. "parser_irssi" or "parser_eggdrop"

# This string contains the format of the date within a logfile filename.
# Examples:
#   filename: #chatroom.20030131	dateformat: *.Ymd
#   filename: #chatroom.20030131	dateformat: \#c\h\atroo\m.Ymd
#   filename: chatroom.log-31012003	dateformat: *.*-dmY
#   filename: chatroom.log-31012003.gz	dateformat: *.*-dmY.\g\z
#   filename: chatroom.log-31012003.gz	dateformat: *.*-dmY.*
# See http://php.net/date_create_from_format for more specific syntax options.
logfile_dateformat = "*.Y-m-d.\lo\g"

sss has a mechanism to prevent it from re-importing log files for the same channel, by storing dates for which it has already parsed files in the table "parse_history", but you can store summary data from several channels in one database.

To do the initial import, just run

php sss.php -c meego.conf -i ${logdir} -o report.html

And to do the nightly import, put the following in a shell script, and call it from cron:

   logdate=`date +%Y-%m-%d -d yesterday`
   logdir=/path/to/logfiles
   sssdir=/path/to/sss-4.0

   /usr/bin/wget -P $logdir http://mg.pov.lt/meego-irclog/%23meego.${logdate}.log;
   /usr/bin/php ${sssdir}/sss.php -c meego.conf -i ${logdir}/\#meego.${logdate}.log

Queries

The sss report is built from a number of queries. A number of other useful sss queries are already listed in the Dashboard page


Activity by month

select date_format(`date`, '%Y-%m') as `date`, sum(`l_total`) as `l_total`, sum(`l_night`) as `l_night`, sum(`l_morning`) as `l_morning`, sum(`l_afternoon`) as `l_afternoon`, sum(`l_evening`) as `l_evening` from `channel` group by year(`date`), month(`date`);

Activity by day over last 30 days

select `date`, `l_total`, `l_night`, `l_morning`, `l_afternoon`, `l_evening` from `channel` where `date` > DATE_SUB(CURDATE(),INTERVAL 30 DAY);
Personal tools