Meego Wiki
Views

Metrics/Gathering data

From MeeGo wiki
< Metrics(Difference between revisions)
Jump to: navigation, search
(Queries)
 
(2 intermediate revisions not shown)
Line 1: Line 1:
 +
 +
=== Data sources ===
 +
 +
For SQL databases, the metrics server has access to the database server for MediaWiki, Bugzilla, and Drupal directly.
 +
 +
For the forum, [[#Forums | we integrate the CSV files currently being exported]], which provide the basic analytics we need.
 +
 +
Mailing lists are parsed by [http://forge.morfeo-project.org/projects/libresoft-tools/ MLStats]. We will use the resulting database directly in the dashboard. See [[#Mailing lists - MLStats]]
 +
 +
We may eventually parse git activity in repositories with [http://lwn.net/Articles/290957/ gitdm], before storing the results directly in a database. For the moment, though, we extract developer information from the meego-commits mailing list.
 +
 +
[[#IRC - SuperSeriousStats | IRC logs]] will be parsed with [http://code.google.com/p/superseriousstats/ superseriousstats], a PHP command line tool that parses IRC logs and stores the results in an SQL database.
 +
 +
We still need to figure out how to do data interchange with Transifex and OBS. Dimitris tells me that there are already [http://meego.transifex.net/stats/ some analytics] available on Transifex, and that there is a RESTful API available to query this data.
 +
For each of the services we gather data for, here's a guide to getting that data:
For each of the services we gather data for, here's a guide to getting that data:
Line 155: Line 170:
<pre>
<pre>
-
select `date`, `l_total`, `l_night`, `l_morning`, `l_afternoon`, `l_evening` from `channel` where `date` > DATE_SUB(CURDATE(),INTERVAL 30 DAY);
+
  select `date`, `l_total`, `l_night`, `l_morning`, `l_afternoon`, `l_evening` from `channel`
 +
  where `date` > DATE_SUB(CURDATE(),INTERVAL 30 DAY);
 +
</pre>
 +
 
 +
== Forums ==
 +
 
 +
Forum stats are available as a series of CSV files on [http://forums.meego.com/stats forums.meego.com], supplied monthly. We need to download the .csv files every month (just the latest ones), parse the CSV files into a database and generate a report from that.
 +
 
 +
=== Downloading CSV files ===
 +
 
 +
To get started and load up all of the old stats, run the following:
 +
 
 +
<pre>
 +
wget -nd -P <local dir for data> -r -l1 --no-parent -A.csv http://forum.meego.com/stats
 +
</pre>
 +
 
 +
This will download all CSV files to the local directory specified.
 +
 
 +
For the monthly refresh, we use wget in a cron script as follows:
 +
<pre>
 +
y=`/bin/date -d "1 month ago" +%Y`
 +
m=`/bin/date -d "1 month ago" +%m`
 +
 
 +
wget -nd -P <local dir for data> -r -l1 --no-parent -A "${y}${m}*.csv" http://forum.meego.com/stats
 +
 
 +
</pre>
 +
 
 +
That wget command line is worth explaining:
 +
* -nd: Don't create the remote directory structure when downloading the files locally
 +
* -P <directory>: Download files to the parent directory specified
 +
* -r: Recursively download
 +
* -l1: Limit to 1 level of directories (combining -r and -l1 allows us to download several files at the same time)
 +
* --no-parent: Ignore the .. link
 +
* -A "${y}${m}*.csv": Match filenames of the form "YYYYMM*.csv" - gives us the latest stats files only
 +
 
 +
=== Database schema ===
 +
 
 +
We created 7 tables, one for each of the statistics provided by the forum.
 +
 
 +
Here is the database schema:
 +
<pre>--
 +
-- Table structure for table `forum_cumulative_posts`
 +
--
 +
 
 +
DROP TABLE IF EXISTS `forum_cumulative_posts`;
 +
CREATE TABLE `forum_cumulative_posts` (
 +
  `month` int(11) NOT NULL,
 +
  `year` int(11) NOT NULL,
 +
  `forum` varchar(50) NOT NULL DEFAULT '',
 +
  `posts` int(11) DEFAULT NULL,
 +
  PRIMARY KEY (`month`,`year`,`forum`)
 +
);
 +
 
 +
--
 +
-- Table structure for table `forum_cumulative_threads`
 +
--
 +
 
 +
DROP TABLE IF EXISTS `forum_cumulative_threads`;
 +
CREATE TABLE `forum_cumulative_threads` (
 +
  `month` int(11) NOT NULL,
 +
  `year` int(11) NOT NULL,
 +
  `forum` varchar(50) NOT NULL DEFAULT '',
 +
  `threads` int(11) DEFAULT NULL,
 +
  PRIMARY KEY (`month`,`year`,`forum`)
 +
);
 +
 
 +
--
 +
-- Table structure for table `forum_hottest_threads`
 +
--
 +
 
 +
DROP TABLE IF EXISTS `forum_hottest_threads`;
 +
CREATE TABLE `forum_hottest_threads` (
 +
  `month` int(11) NOT NULL,
 +
  `year` int(11) NOT NULL,
 +
  `rank` int(11) NOT NULL DEFAULT '0',
 +
  `title` varchar(255) DEFAULT NULL,
 +
  `posts` int(11) DEFAULT NULL,
 +
  PRIMARY KEY (`month`,`year`,`rank`)
 +
);
 +
 
 +
--
 +
-- Table structure for table `forum_most_viewed_threads`
 +
--
 +
 
 +
DROP TABLE IF EXISTS `forum_most_viewed_threads`;
 +
CREATE TABLE `forum_most_viewed_threads` (
 +
  `month` int(11) NOT NULL,
 +
  `year` int(11) NOT NULL,
 +
  `rank` int(11) NOT NULL DEFAULT '0',
 +
  `title` varchar(255) DEFAULT NULL,
 +
  `views` int(11) DEFAULT NULL,
 +
  PRIMARY KEY (`month`,`year`,`rank`)
 +
);
 +
 
 +
--
 +
-- Table structure for table `forum_posts`
 +
--
 +
 
 +
DROP TABLE IF EXISTS `forum_posts`;
 +
CREATE TABLE `forum_posts` (
 +
  `month` int(11) NOT NULL,
 +
  `year` int(11) NOT NULL,
 +
  `forum` varchar(50) NOT NULL DEFAULT '',
 +
  `posts` int(11) DEFAULT NULL,
 +
  PRIMARY KEY (`month`,`year`,`forum`)
 +
);
 +
 
 +
--
 +
-- Table structure for table `forum_top_posters`
 +
--
 +
 
 +
DROP TABLE IF EXISTS `forum_top_posters`;
 +
CREATE TABLE `forum_top_posters` (
 +
  `month` int(11) NOT NULL,
 +
  `year` int(11) NOT NULL,
 +
  `rank` int(11) NOT NULL DEFAULT '0',
 +
  `member` varchar(50) DEFAULT NULL,
 +
  `posts` int(11) DEFAULT NULL,
 +
  PRIMARY KEY (`month`,`year`,`rank`)
 +
);
 +
 
 +
--
 +
-- Table structure for table `forum_top_thanked`
 +
--
 +
 
 +
DROP TABLE IF EXISTS `forum_top_thanked`;
 +
CREATE TABLE `forum_top_thanked` (
 +
  `month` int(11) NOT NULL,
 +
  `year` int(11) NOT NULL,
 +
  `rank` int(11) NOT NULL DEFAULT '0',
 +
  `member` varchar(50) DEFAULT NULL,
 +
  `thanks` int(11) DEFAULT NULL,
 +
  PRIMARY KEY (`month`,`year`,`rank`)
 +
);
 +
 
 +
</pre>
 +
 
 +
We import the files into the databases via <pre>LOAD DATA LOCAL INFILE</pre>
 +
 
 +
<pre>
 +
cd <local data directory>
 +
 
 +
if [ -f ${y}${m}_forum_cumulative_posts.csv ]; then
 +
  echo "Importing ${y}${m}_forum_cumulative_posts.csv";
 +
 
 +
  # forum,posts
 +
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
 +
        "LOAD DATA LOCAL INFILE '${y}${m}_forum_cumulative_posts.csv'
 +
        INTO TABLE forum_cumulative_posts
 +
        FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
 +
        IGNORE 1 LINES
 +
        (forum, posts)
 +
        set year=$y, month=$m"
 +
fi;
 +
 
 +
if [ -f ${y}${m}_forum_cumulative_threads.csv ]; then
 +
  echo "Importing ${y}${m}_forum_cumulative_threads.csv";
 +
 
 +
  # forum,threads
 +
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
 +
        "LOAD DATA LOCAL INFILE '${y}${m}_forum_cumulative_threads.csv'
 +
        INTO TABLE forum_cumulative_threads
 +
        FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
 +
        IGNORE 1 LINES
 +
        (forum, threads)
 +
        set year=$y, month=$m"
 +
fi;
 +
 
 +
if [ -f ${y}${m}_forum_posts.csv ]; then
 +
  echo "Importing ${y}${m}_forum_posts.csv";
 +
 
 +
  # forum,posts
 +
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
 +
        "LOAD DATA LOCAL INFILE '${y}${m}_forum_posts.csv'
 +
        INTO TABLE forum_posts
 +
        FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
 +
        IGNORE 1 LINES
 +
        (forum, posts)
 +
        set year=$y, month=$m"
 +
fi;
 +
 
 +
 
 +
 
 +
if [ -f ${y}${m}_hottest_threads.csv ]; then
 +
  echo "Importing ${y}${m}_hottest_threads.csv";
 +
 
 +
  # rank,title,posts
 +
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
 +
        "LOAD DATA LOCAL INFILE '${y}${m}_hottest_threads.csv'
 +
        INTO TABLE forum_hottest_threads
 +
        FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
 +
        IGNORE 1 LINES
 +
        (rank, title, posts)
 +
        set year=$y, month=$m"
 +
fi;
 +
 
 +
if [ -f ${y}${m}_most_viewed_threads.csv ]; then
 +
  echo "Importing ${y}${m}_most_viewed_threads.csv";
 +
 
 +
  # rank,title,views
 +
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
 +
        "LOAD DATA LOCAL INFILE '${y}${m}_most_viewed_threads.csv'
 +
        INTO TABLE forum_most_viewed_threads
 +
        FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
 +
        IGNORE 1 LINES
 +
        (rank, title, views)
 +
        set year=$y, month=$m"
 +
fi;
 +
 
 +
if [ -f ${y}${m}_top_posters.csv ]; then
 +
  echo "Importing ${y}${m}_top_posters.csv";
 +
 
 +
  # rank,member,posts
 +
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
 +
        "LOAD DATA LOCAL INFILE '${y}${m}_top_posters.csv'
 +
        INTO TABLE forum_top_posters
 +
        FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
 +
        IGNORE 1 LINES
 +
        (rank, member, posts)
 +
        set year=$y, month=$m"
 +
fi;
 +
 
 +
if [ -f ${y}${m}_top_thanked.csv ]; then
 +
  echo "Importing ${y}${m}_top_thanked.csv";
 +
 
 +
  # rank,member,thanks
 +
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
 +
        "LOAD DATA LOCAL INFILE '${y}${m}_top_thanked.csv'
 +
        INTO TABLE forum_top_thanked
 +
        FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
 +
        IGNORE 1 LINES
 +
        (rank, member, thanks)
 +
        set year=$y, month=$m"
 +
fi;
 +
 
 +
</pre>
 +
 
 +
We run this script (which downloads and imports the .csv files froim the server) monthly through cron. Not sure when the files are put up on the 1st, so I get them on the 2nd:
 +
<pre>
 +
# Retrieve forum stats monthly on 2nd of month
 +
15 2 2 * * /home/dneary/bin/forum_stats.sh
</pre>
</pre>

Latest revision as of 15:01, 26 July 2011

Contents

Data sources

For SQL databases, the metrics server has access to the database server for MediaWiki, Bugzilla, and Drupal directly.

For the forum, we integrate the CSV files currently being exported, which provide the basic analytics we need.

Mailing lists are parsed by MLStats. We will use the resulting database directly in the dashboard. See #Mailing lists - MLStats

We may eventually parse git activity in repositories with gitdm, before storing the results directly in a database. For the moment, though, we extract developer information from the meego-commits mailing list.

IRC logs will be parsed with superseriousstats, a PHP command line tool that parses IRC logs and stores the results in an SQL database.

We still need to figure out how to do data interchange with Transifex and OBS. Dimitris tells me that there are already some analytics available on Transifex, and that there is a RESTful API available to query this data.

For each of the services we gather data for, here's a guide to getting that data:

Mailing lists - MLStats

Mailman mailing lists can be downloaded, parsed and stored in a MySQL database using MLStats.

The general idea is to point mlstats at the list archive page, and let it do the work of figuring out what to download.

We are carrying a small local patch to mlstats to ensure that it re-downloads the current month's archives and reparses them. The patch has been submitted upstream.

Once you have downloaded and unpacked MLStats 0.4, patched it with the patch above, and installed it with
setup.py install --prefix=/install/path
you will need to "prime the pump", and download and import the archives to all of the Maemo mailing lists.

The format of the mlstats command line is:

/path/to/mlstats --db-user=<username> --db-password=<password> http://lists.meego.com/pipermail/meego-announce/
The command line option
--no-report
suppresses the creation of a report after the import, useful for a cron job.

You can list the archives for a number of mailing lists together on the command line. The list of MeeGo mailing lists is here.

for list in meego-adaptation-intel-automotive meego-announce meego-architecture \
            meego-commits meego-community meego-dev meego-distribution-tools \
            meego-events meego-handset meego-il10n meego-inputmethods meego-it \
            meego-ivi meego-kernel meego-packaging meego-pm meego-porting \
            meego-python meego-qa meego-releases meego-sdk meego-security \
            meego-security-discussion meego-touch-dev meego-tv;
do
  /path/to/mlstats --no-report --db-user=<username> --db-password=<password> http://lists.meego.com/pipermail/${list} >> /tmp/output.txt
done;

This should be run every night through cron.

Mailing list graph over time

The SQL query (yes, a Big Hairy Beast) which can be used to create a report graphing the messages posted to each list, month by month, is this:

SELECT
     `messages`.`mailing_list_url` AS list,
     year(first_date) AS y,
     monthname(first_date) AS mon,
     month(first_date) AS m,
     date_format(first_date, '%M %Y') as monthstr,
     date_format(first_date,'%Y%m') as monthnum, 
     count(*) AS c
FROM
     `messages`
WHERE
     year(first_date)		 > 1979 and 
     mailing_list_url not like '%meego-commits%' and first_date<'2011-05-01'
GROUP BY
     `messages`.`mailing_list_url`,
     y,m
ORDER BY
     monthnum ASC,
     list ASC,
     c ASC

We use $monthnum to order the results over time, and $monthstr as a more meaningful X axis label. We eliminate all emails with bad Date headers and filter meego-commits out of our analysis, and group the messages by mailing list, to give the following graph as the final result (how to create a report will follow later):

Mlgraph.png

IRC - SuperSeriousStats

SuperSeriousStats is a super serious IRC log analyser, developed by Jos de Ruiter. It is written in PHP.

To use SuperSeriousStats, you need to install the php-cli command line interface. Once you have this, and the other dependencies for sss listed here, you will be able to run sss.

Before running it for the first time, initialise the database. You will need to download all of the IRC logs you wish to import first. Logs are expected in the Supybot format (example).

Getting the IRC logs

For the initial download, the following script should help get all the logs:

for y in 2010 2011; do
 for m in `seq -f '%02g' 1 12`; do
  for d in `seq -f '%02g' 1 31`; do
   wget http://mg.pov.lt/meego-irclog/%23meego.$y-$m-$d.log;
  done;
 done;
done;

Thereafter, you can add the following line in your cron job to get the previous day's logs:

   logdate= `date +%Y-%m-%d -d yesterday`
   wget -P $logdir http://mg.pov.lt/meego-irclog/%23meego.${logdate}.log;

Parsing the logs

To run sss, you will need to modify a number of fields in sss.conf. Notably, I had to change the filename format to "*.Y-m-d.\lo\g" to match the log file names, and set log format to supybot. I recommend creating a separate config file for each channel you want to parse. Here is the appropriate sss.conf for #meego logs:

#
#    This file contains all of sss' settings along with their defaults.
#    All values must be placed between double quotes, even empty ones.
#
#################################  Required  ###################################

channel = "#meego"		# Name of the IRC channel.
timezone = "UTC"	# Timezone the logs are in. Used for time offset
			# calculations and conversions.
			# See http://php.net/manual/en/timezones.php
db_host = "host"		# IP address or FQDN of the MySQL server.
db_port = "3306"	# Port the MySQL server is listening on.
db_user = "db_user"	# MySQL user.
db_pass = "db_password"		# MySQL password.
db_name = "db_name"		# Name of the MySQL database used for sss.
parser = "parser_supybot"	# The parser to use depending on logfile format.
			# e.g. "parser_irssi" or "parser_eggdrop"

# This string contains the format of the date within a logfile filename.
# Examples:
#   filename: #chatroom.20030131	dateformat: *.Ymd
#   filename: #chatroom.20030131	dateformat: \#c\h\atroo\m.Ymd
#   filename: chatroom.log-31012003	dateformat: *.*-dmY
#   filename: chatroom.log-31012003.gz	dateformat: *.*-dmY.\g\z
#   filename: chatroom.log-31012003.gz	dateformat: *.*-dmY.*
# See http://php.net/date_create_from_format for more specific syntax options.
logfile_dateformat = "*.Y-m-d.\lo\g"

sss has a mechanism to prevent it from re-importing log files for the same channel, by storing dates for which it has already parsed files in the table "parse_history", but you can store summary data from several channels in one database.

To do the initial import, just run

php sss.php -c meego.conf -i ${logdir} -o report.html

And to do the nightly import, put the following in a shell script, and call it from cron:

   logdate=`date +%Y-%m-%d -d yesterday`
   logdir=/path/to/logfiles
   sssdir=/path/to/sss-4.0

   /usr/bin/wget -P $logdir http://mg.pov.lt/meego-irclog/%23meego.${logdate}.log;
   /usr/bin/php ${sssdir}/sss.php -c meego.conf -i ${logdir}/\#meego.${logdate}.log

Queries

The sss report is built from a number of queries. A number of other useful sss queries are already listed in the Dashboard page


Activity by month

  select date_format(`date`, '%Y-%m') as `date`,
       sum(`l_total`) as `l_total`,
       sum(`l_night`) as `l_night`,
       sum(`l_morning`) as `l_morning`,
       sum(`l_afternoon`) as `l_afternoon`,
       sum(`l_evening`) as `l_evening`
  from `channel`
  group by year(`date`), month(`date`);

Activity by day over last 30 days

  select `date`, `l_total`, `l_night`, `l_morning`, `l_afternoon`, `l_evening` from `channel`
  where `date` > DATE_SUB(CURDATE(),INTERVAL 30 DAY);

Forums

Forum stats are available as a series of CSV files on forums.meego.com, supplied monthly. We need to download the .csv files every month (just the latest ones), parse the CSV files into a database and generate a report from that.

Downloading CSV files

To get started and load up all of the old stats, run the following:

wget -nd -P <local dir for data> -r -l1 --no-parent -A.csv http://forum.meego.com/stats

This will download all CSV files to the local directory specified.

For the monthly refresh, we use wget in a cron script as follows:

y=`/bin/date -d "1 month ago" +%Y`
m=`/bin/date -d "1 month ago" +%m`

wget -nd -P <local dir for data> -r -l1 --no-parent -A "${y}${m}*.csv" http://forum.meego.com/stats

That wget command line is worth explaining:

  • -nd: Don't create the remote directory structure when downloading the files locally
  • -P <directory>: Download files to the parent directory specified
  • -r: Recursively download
  • -l1: Limit to 1 level of directories (combining -r and -l1 allows us to download several files at the same time)
  • --no-parent: Ignore the .. link
  • -A "${y}${m}*.csv": Match filenames of the form "YYYYMM*.csv" - gives us the latest stats files only

Database schema

We created 7 tables, one for each of the statistics provided by the forum.

Here is the database schema:

--
-- Table structure for table `forum_cumulative_posts`
--

DROP TABLE IF EXISTS `forum_cumulative_posts`;
CREATE TABLE `forum_cumulative_posts` (
  `month` int(11) NOT NULL,
  `year` int(11) NOT NULL,
  `forum` varchar(50) NOT NULL DEFAULT '',
  `posts` int(11) DEFAULT NULL,
  PRIMARY KEY (`month`,`year`,`forum`)
);

--
-- Table structure for table `forum_cumulative_threads`
--

DROP TABLE IF EXISTS `forum_cumulative_threads`;
CREATE TABLE `forum_cumulative_threads` (
  `month` int(11) NOT NULL,
  `year` int(11) NOT NULL,
  `forum` varchar(50) NOT NULL DEFAULT '',
  `threads` int(11) DEFAULT NULL,
  PRIMARY KEY (`month`,`year`,`forum`)
);

--
-- Table structure for table `forum_hottest_threads`
--

DROP TABLE IF EXISTS `forum_hottest_threads`;
CREATE TABLE `forum_hottest_threads` (
  `month` int(11) NOT NULL,
  `year` int(11) NOT NULL,
  `rank` int(11) NOT NULL DEFAULT '0',
  `title` varchar(255) DEFAULT NULL,
  `posts` int(11) DEFAULT NULL,
  PRIMARY KEY (`month`,`year`,`rank`)
);

--
-- Table structure for table `forum_most_viewed_threads`
--

DROP TABLE IF EXISTS `forum_most_viewed_threads`;
CREATE TABLE `forum_most_viewed_threads` (
  `month` int(11) NOT NULL,
  `year` int(11) NOT NULL,
  `rank` int(11) NOT NULL DEFAULT '0',
  `title` varchar(255) DEFAULT NULL,
  `views` int(11) DEFAULT NULL,
  PRIMARY KEY (`month`,`year`,`rank`)
);

--
-- Table structure for table `forum_posts`
--

DROP TABLE IF EXISTS `forum_posts`;
CREATE TABLE `forum_posts` (
  `month` int(11) NOT NULL,
  `year` int(11) NOT NULL,
  `forum` varchar(50) NOT NULL DEFAULT '',
  `posts` int(11) DEFAULT NULL,
  PRIMARY KEY (`month`,`year`,`forum`)
);

--
-- Table structure for table `forum_top_posters`
--

DROP TABLE IF EXISTS `forum_top_posters`;
CREATE TABLE `forum_top_posters` (
  `month` int(11) NOT NULL,
  `year` int(11) NOT NULL,
  `rank` int(11) NOT NULL DEFAULT '0',
  `member` varchar(50) DEFAULT NULL,
  `posts` int(11) DEFAULT NULL,
  PRIMARY KEY (`month`,`year`,`rank`)
);

--
-- Table structure for table `forum_top_thanked`
--

DROP TABLE IF EXISTS `forum_top_thanked`;
CREATE TABLE `forum_top_thanked` (
  `month` int(11) NOT NULL,
  `year` int(11) NOT NULL,
  `rank` int(11) NOT NULL DEFAULT '0',
  `member` varchar(50) DEFAULT NULL,
  `thanks` int(11) DEFAULT NULL,
  PRIMARY KEY (`month`,`year`,`rank`)
);

We import the files into the databases via
LOAD DATA LOCAL INFILE
cd <local data directory>

if [ -f ${y}${m}_forum_cumulative_posts.csv ]; then
  echo "Importing ${y}${m}_forum_cumulative_posts.csv";

  # forum,posts
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
        "LOAD DATA LOCAL INFILE '${y}${m}_forum_cumulative_posts.csv'
         INTO TABLE forum_cumulative_posts
         FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
         IGNORE 1 LINES
         (forum, posts)
         set year=$y, month=$m"
fi;

if [ -f ${y}${m}_forum_cumulative_threads.csv ]; then
  echo "Importing ${y}${m}_forum_cumulative_threads.csv";

  # forum,threads
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
        "LOAD DATA LOCAL INFILE '${y}${m}_forum_cumulative_threads.csv'
         INTO TABLE forum_cumulative_threads
         FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
         IGNORE 1 LINES
         (forum, threads)
         set year=$y, month=$m"
fi;

if [ -f ${y}${m}_forum_posts.csv ]; then
  echo "Importing ${y}${m}_forum_posts.csv";

  # forum,posts
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
        "LOAD DATA LOCAL INFILE '${y}${m}_forum_posts.csv'
         INTO TABLE forum_posts
         FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
         IGNORE 1 LINES
         (forum, posts)
         set year=$y, month=$m"
fi;



if [ -f ${y}${m}_hottest_threads.csv ]; then
  echo "Importing ${y}${m}_hottest_threads.csv";

  # rank,title,posts
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
        "LOAD DATA LOCAL INFILE '${y}${m}_hottest_threads.csv'
         INTO TABLE forum_hottest_threads
         FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
         IGNORE 1 LINES
         (rank, title, posts)
         set year=$y, month=$m"
fi;

if [ -f ${y}${m}_most_viewed_threads.csv ]; then
  echo "Importing ${y}${m}_most_viewed_threads.csv";

  # rank,title,views
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
        "LOAD DATA LOCAL INFILE '${y}${m}_most_viewed_threads.csv'
         INTO TABLE forum_most_viewed_threads
         FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
         IGNORE 1 LINES
         (rank, title, views)
         set year=$y, month=$m"
fi;

if [ -f ${y}${m}_top_posters.csv ]; then
  echo "Importing ${y}${m}_top_posters.csv";

  # rank,member,posts
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
        "LOAD DATA LOCAL INFILE '${y}${m}_top_posters.csv'
         INTO TABLE forum_top_posters
         FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
         IGNORE 1 LINES
         (rank, member, posts)
         set year=$y, month=$m"
fi;

if [ -f ${y}${m}_top_thanked.csv ]; then
  echo "Importing ${y}${m}_top_thanked.csv";

  # rank,member,thanks
  mysql -u<db_user> -p<db_password> -h <host> <database> -e \
        "LOAD DATA LOCAL INFILE '${y}${m}_top_thanked.csv'
         INTO TABLE forum_top_thanked
         FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
         IGNORE 1 LINES
         (rank, member, thanks)
         set year=$y, month=$m"
fi;

We run this script (which downloads and imports the .csv files froim the server) monthly through cron. Not sure when the files are put up on the 1st, so I get them on the 2nd:

# Retrieve forum stats monthly on 2nd of month
15 2 2 * * /home/dneary/bin/forum_stats.sh
Personal tools