Contents |
There are two ways to create a Pentaho report: the first is a method to create simple tabular reports using the Pentaho user console "ad-hoc reporting" capability, but as the more powerful method, we will concentrate on the Pentaho Report Designer, available from Sourceforge.
There are four basic steps to creating a report in PRD: 1. Make a database connection available as a JNDI source for the Pentaho platform 2. Duplicate the JDNI configuration locally for PRD (since the designer will not have direct access to the Pentaho database) 3. Create queries which will be made available to the report designer 4. Lay out the report to make the results of those queries available to the user (as a graph, tabular data, etc)
The easiest way to configure a new JNDI connection is to start the Pentaho Administration Console (found in $PENTAHO_JOME/administration_console) with "start_pac.sh". The reason we use JNDI rather than a direct JDBC connection is that when we deploy the report to a different instance, where the username, password and database host are potentially different, we don't have to change the report, we simply use the same symbolic name for the JNDI source.
In Pentaho Admin Console (PAC), click "Administration", then "Database connections", and create a new connection. For the forum database, for example, the local parameters here were:
Then click "Test", and if everything is fine, you should see "Connection test successful".
Potential pitfalls are if you are using c3p0 libraries to do connection pooling as described in the Pentaho wiki, but have not included c3p0.jar on the classpath for PAC, or if the database user does not have the appropriate permissions or access for the database. You can troubleshoot permissions issues using the MySQL command line client.
Before you can use a "pure" JNDI connection in PRD, you need to declare it locally to make it available.
Open the file~/.pentaho/simple-jndi/default.propertiesand add this to the bottom:
Forum/type=javax.sql.DataSource Forum/driver=com.mysql.jdbc.Driver Forum/user=<db_user> Forum/password=<sure_ill_tell_you> Forum/url=jdbc:mysql://localhost:3306/forum
Hopefully the parallels between this configuration and the JNDI configuration in PAC are obvious! The prefix "Forum/" represents the symbolic name of the JNDI connection.
Once you have downloaded PRD, you can run report_designer.sh directly from the directory in which you uncompressed the package. PRD looks like a typical Java application (that is to say, on Linux it's not very pretty).
Create a new report with File->New, and you will see on the right-hand side of the screen a pane with two tabs, "Data" and "Structure". PRD is semi-WYSIWYG. Reports have a fixed structure, and each part of the report is evaluated based on a query associated with the report. The group header, for example, will appear once for every group defined in a query, what we put in the data section will be evaluated once for each row returned by the query, and the report header will be evaluated only once per report generation. Styling of elements in the report is cascading and hierarchical: if I set a font family for the report root, this will cascade throughout the report. The styling syntax is very similar to CSS.
To make a report do anything useful, we much associate a query with it. If we want to use several queries in one single document, we must declare all the queries at the top level, and then use sub-reports for each query to display.
Click on the Data tab in PRD. We can define a number of sources of data for the report by right-clicking on "Data sets". For a database source, we use the JDBC source type.
On the screen which pops up, we can add a new connection by clicking on the + icon above the connection names, and then choose "JNDI" as our connection type, and set the name of the connection for PRD with the name of the JNDI connection we configured before (see screenshot).
Test the connection with the "Test" button, then click "OK" to finalise things and make the connection available for queries.
Once the connection is correctly configured, you can create and edit queries. Click the "+" icon above the "Available queries" frame, type in a name, and you can start typing directly into the Query text entry area.
There is a graphical tool to help you join tables and perform more complicated wueries, which you can access by double-clicking the pencil icon above this area. By clicking on it, you will be brought to the SQL query designer. I have found this useful in the early stages of writing a queries, for simple joins between tables and for situations where I am not using any SQL functions, but for polishing queries, I tend to use the raw text.
For now, let's create a query to list the top 20 posters for last month. Drag the "top_posters" table onto the query canvas area on the right; all of the fields will be selected by default, and that's fine. Just to test the query, start with "where month=6 and year=2011" as the condition.
The final query is:
SELECT
month,
year,
rank,
member,
posts
FROM
forum_top_posters
WHERE
month=6
and year=2011
LIMIT 20
and when you hit "Preview", you should get a data table like this:
| month | year | rank | member | posts |
|---|---|---|---|---|
| 6 | 2011 | 1 | qgil | 185 |
| 6 | 2011 | 2 | sjgadsby | 115 |
| 6 | 2011 | 3 | shmerl | 104 |
| 6 | 2011 | 4 | texrat | 77 |
| 6 | 2011 | 5 | jaffa | 60 |
| 6 | 2011 | 6 | swift11 | 52 |
| 6 | 2011 | 7 | jonay | 42 |
| 6 | 2011 | 8 | mja | 35 |
| 6 | 2011 | 9 | vitna | 35 |
| 6 | 2011 | 10 | timoph | 35 |
| 6 | 2011 | 11 | hartti | 29 |
| 6 | 2011 | 12 | lm | 27 |
| 6 | 2011 | 13 | rubear2009 | 26 |
| 6 | 2011 | 14 | andre | 25 |
| 6 | 2011 | 15 | profebral | 23 |
| 6 | 2011 | 16 | mikecomputing | 21 |
| 6 | 2011 | 17 | ritratt | 21 |
| 6 | 2011 | 18 | helex | 19 |
| 6 | 2011 | 19 | cckwes | 19 |
| 6 | 2011 | 20 | pycage | 19 |
Hit OK, and now your query is selected for use with your report.
To make the query always refer to "last month", you need to resort to datetime functions in MySQL in the WHERE clause.
WHERE month = month(date_add(now(), INTERVAL -1 MONTH)) AND year = year(date_add(now(), INTERVAL -1 MONTH))