Meego Wiki
Views

User:Tgalvin/monitoring

From MeeGo wiki
Jump to: navigation, search

Contents

Monitoring

A first analysis of monitored data based on current requirements

Tables

Testrun

@type testrun_id: C{str} @param testrun_id: The testrun id

@type device_group: C{str} @param device_group: The name of the device group

@type queue: C{str} @param queue: The type of build from CI perspective

@type configuration: C{str} @param configuration: What's contained in the image

@type host_worker_instances: C{list} of C{tuple} of C{str}, C{int} @param hosts_worker_instances: The e.g. worker 1 instance 1

@type requestor: C{str} @param requestor: The first email on the list

@type request_id : C{str} @param request_id : An identifier for the request from the client

@type error: C{Exception} @param error: An exception associated with the run

Events

Currently there are requirements for the following:

  • Test run request
  • Task posted on the queue
  • Task pulled from queue
  • Hardware Operation (e.g. flashing)
  • Device Boot

Persisting these :

| testrun_id | event_name | event_emit | event_receive |

Where:

  • event_name is the name of the event
  • event_emit is the timestamp the event was emitted
  • event_receive is the timestamp when the event was received


Statistics

The monitoring events are then used to generate the statistics

Testrun

TimeDeltas

e.g.

  • testrun_dt = results_reported - request_time
  • task_wait_dt = task_pulled - task_posted
  • hw_event_time = hw_event_finish - hw_event_start

etc...

No of Iterations

C{int}

Simple count of the number of events of a given name

e.g. number of flash iterations

Device Group

  • Requestors ordered by number of requests made
  • The ongoing testruns
  • The queue testruns
  • The number of testruns ending in error
  • The percentage of testruns ending in error

Time Stats

The min, max, average (std dev) for:

  • testrun
  • flash
  • boot
  • tests

Lash Up

http://www.sarasola.co.uk/demo/testrun_bars.html

Purpose

  • Drive requirements discussion.
  • Kick the tyres on some candidate technologies

Model

This uses the concept of "Events" which provide waypoints through the testrun journey.

For this I extended the hdf hack to contain more data (code at the end of this wiki)

Data Acquisition / Graph Generation

The use of Python Tables means the graph comes out as numpy RecordArray these are a convenient way of using tabulated numerical data and are much quicker (both in terms of coding and runtime speed) than using object models and such like.

I put the querying for the charts in a generator. The idea here is that the latency is hidden from the user so the extraction and calculations can be done on demand while the user is entertained with results already shown.

Again - The code is shown below.

Graph

Whilst I used pygooglechart for the demo, it turns out that this is unsuitable in that it uses a web service and this compromises privacy.

If the stacked bar chart is considered useful then we should look for a library that provides this. GChart or Chaco might be worth looking at.

Again... the code is below and porting to another library should be straightforward.

Front End

To get the rich user interaction Pyjamas was used.

The tabpanelwidget demo was pretty much used verbatim.

Framework

These hacks separate the MVC elements but I've yet to string it together. I have used the technologies that allow me to create a demo to be created quickly, that are adaptable and could conceivably be used in production. The Django Framework could be used to house the elements, whatever choices we decide to make.

Scripts

Spike code ... Not production quality

Dummy Data

Generates some events and saves to hdf

def dummy_run():
    h5file = openFile(FILENAME, mode = "w", title = "demo")
    group = h5file.createGroup("/", DEVICE_GROUP, 'Device Group')
    table = h5file.createTable(group, 'events', Event, "Event example")
    event = table.row
    for i in xrange(NO_OF_RUNS):
        event['testrun_id']  = str(i)
        event['ev_name'] = "request"
        req = i*100 + random.randint(0,20)
        event['emit'] = req
        event['received'] = req + 1
        event.append()
        #
        event['testrun_id']  = str(i)
        event['ev_name'] = "queue_start"
        q_s =  req + random.randint(1,2)
        event['emit'] = q_s 
        event['received'] = q_s 
        event.append()
        #
        event['testrun_id']  = str(i)
        event['ev_name'] = "queue_end"
        q_e = q_s + random.randint(1,5)
        event['emit'] = q_e 
        event['received'] = q_e + 1
        event.append()
        #
        event['testrun_id']  = str(i)
        event['ev_name'] = "flash_start"
        event['emit'] = q_e
        event['received'] = q_e + 1
        event.append()
        #
        event['testrun_id']  = str(i)
        event['ev_name'] = "flash_end"
        f_e = q_e + random.randint(10,20)
        event['emit'] = f_e
        event['received'] = f_e + 1
        event.append()
        #
        event['testrun_id']  = str(i)
        event['ev_name'] = "boot_start"
        event['emit'] = f_e
        event['received'] = f_e + 1
        event.append()
        #
        event['testrun_id']  = str(i)
        event['ev_name'] = "boot_end"
        b_e = f_e + random.randint(5,10)
        event['emit'] = b_e
        event['received'] = b_e + 1
        event.append()
        #
        event['testrun_id']  = str(i)
        event['ev_name'] = "test_start"
        event['emit'] = b_e
        event['received'] = b_e + 1
        event.append()
        #
        event['testrun_id']  = str(i)
        event['ev_name'] = "test_end"
        t_e = b_e + random.randint(20,40)
        event['emit'] = t_e
        event['received'] = t_e + 1
        event.append()
        #
        event['testrun_id']  = str(i)
        event['ev_name'] = "publish"
        event['emit'] = t_e + 5
        event['received'] = t_e + 5
        event.append()
        #
   h5file.close()

Chart Generator

import os
import sys
import math
import tables
from itertools import groupby
ROOT = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, os.path.join(ROOT, '..'))


FILENAME = "events.h5"

from pygooglechart import StackedHorizontalBarChart
from pygooglechart import Axis
from numpy import unique
STEP = 20
WIDTH = 600
HEIGHT = 400
X_RANGE = 80
def _mask(ra, testrun_ids):
    ret_val = None   
    for testrun_id in testrun_ids:
        mask = (ra["testrun_id"] == testrun_id)
        if ret_val is not None:
            ret_val = ret_val | mask
        else:
            ret_val = mask
    return ret_val
def _rev_testrun_ids(ra):
    a=  [k for k, g in groupby(ra["testrun_id"])]
    a.reverse()
    return a
   
def _iter_testrun_chunk(step_size):
    f=tables.openFile(FILENAME)
    events = f.root.device_group_1.events
    ra = events.read()
    rev_testrun_ids = _rev_testrun_ids(ra)
    last_step = 0
    for step in range(step_size, len(rev_testrun_ids), step_size):
        m = _mask(ra, rev_testrun_ids[last_step : step])
        last_step = step
        yield ra[m]
def _deltas(ra, start_event, end_event):
    req_mask = (ra["ev_name"] == start_event)
    pub_mask =  (ra["ev_name"] == end_event)
    return  ra["received"][pub_mask] - ra["received"][req_mask]
def get_bar_chart(testrun_ids):
    chart = StackedHorizontalBarChart(WIDTH, 
                                         HEIGHT,
                                         x_range=(0, X_RANGE))
    axis = range(0, X_RANGE + 1, 10)
    chart.set_axis_labels(Axis.LEFT, testrun_ids)
    chart.set_title("Testrun Time and Motion Analysis")
    #
    chart.set_axis_labels(Axis.BOTTOM, axis)
    index = chart.set_axis_labels(Axis.BOTTOM, ['Time / min'])
    chart.set_axis_style(index, '202020', font_size=10, alignment=0)
    chart.set_axis_positions(index, [50])
    chart.set_bar_width(10)        
    return chart
def chart_iter():
    for ra in _iter_testrun_chunk(STEP):
        chart = get_bar_chart(_rev_testrun_ids(ra))
        chart.set_colours(['a5cee3', '1f78b3', 
                           'b2de89','fcbf6f','fb9a99', '693d9a'])
        chart.set_legend(['initialisation', 'queue', 'flash', 
                          'boot', 'test', 'publish'])
        chart.add_data(_deltas(ra, "request", "queue_start"))
        chart.add_data(_deltas(ra, "queue_start", "queue_end"))
        chart.add_data(_deltas(ra, "flash_start", "flash_end"))
        chart.add_data(_deltas(ra, "boot_start", "boot_end"))
        chart.add_data(_deltas(ra, "test_start", "test_end"))
        chart.add_data(_deltas(ra, "test_end", "publish"))
        yield chart
def main():    
    for i, chart in enumerate(chart_iter()):
        chart.download("dts%s.png"%i)
        if i >10:
            break
if __name__ == '__main__':
    main()

Querying Performance

Not a formal benchmark. This is not too scientific and is *for initial indication only*

I have calculated the times to get the `timedeltas` for each step (flash, boot etc) on a dataset of 1000 testruns

Both sets of code are crude hacks to give some initial indications. No doubt both could be optimised.

Django (via ORM)

10.88 seconds

hdf + numpy

0.52 seconds

Personal tools