Meego Wiki
Views

Release Infrastructure/BOSS/Performance/Results

From MeeGo wiki
(Difference between revisions)
Jump to: navigation, search
Line 5: Line 5:
=== Test Method ===
=== Test Method ===
-
* The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) to BOSS then record the start and end time. Then it's easy to get the "Rate" for that test case.
+
* The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) for on shot or iteratively to BOSS then record the corresponding start and end time. Then it's easy to get the "Rate" for that test case.
=== Test Environment ===
=== Test Environment ===
Line 21: Line 21:
=== Test Cases ===
=== Test Cases ===
* Test Case Set 1
* Test Case Set 1
-
In this case set, we test the rates for launching different workflow scales(for N=[several hundreds, ..., several thousands]). Here we also compared the situation of different number of workers. And the CPU/MEM/DISK load is record.
+
In this case set, we executed following cases:
-
  * Raw data and graph
+
  * Request 5000 workflows repeatly(send another 5k workflows to engine after previous 5000 finished);
-
[[File:boss_performance_test_0920.PNG]]
+
  Repeat 20 times(100k workflows will be handled totally)
-
* CPU/MEM/DISK load for 10k and 20k cases
+
[[File:boss_performance_test_5kx20.PNG]]
-
[[File:load_10k.PNG]]
+
  * Request 1000 workflows repeatly(send another 1k workflows to engine after previous 1000 finished);
-
[[File:load_20k.PNG]]
+
    Repeat 100 times(100k workflows will be handled totally)
 +
[[File:boss_performance_test_1kx100.PNG]]
From above results, we can find some interesting things:
From above results, we can find some interesting things:
-
  * Rates are decreasing while more number of workflows launched once time; that means the performance is decreasing when more request coming.
+
  * Rates are almost NOT decreasing(for 1k or 5k, they are both around "25")
  * CPU/DISK is at high rate during testing
  * CPU/DISK is at high rate during testing
  * Memory is still taken after workflow finished. It looks like "memory leak" but it may not be if considering the "warm cache(or GC)" in Ruby/Python like language
  * Memory is still taken after workflow finished. It looks like "memory leak" but it may not be if considering the "warm cache(or GC)" in Ruby/Python like language
-
  * Even multiple workers, if they all run in one machine, performance can not be boost. This is as excepted considering the CPU/DISK load
+
  * NO workflow losing
-
  * Some workflows lost in some test cases. Need further investigation
+
  * NO engine crashing
-
* Observed engine crashed some times. Need further investigation
+
-
+
-
 
+
-
 
+
-
* Test Case Set 2
+
-
Here we got the "reponse time" testing results. The way is to repeat to launch 1k workflows each time after previous 1k workflows finished. Observing the durations for executing each 1k workflows we can get the response time trend. Following are results on VM and HW
+
-
 
+
-
[[File:Response_1k.PNG]]
+
-
 
+
-
From above results, reponse time is increasing while handling more and more workflows; that means the performance(or rates) is decreasing. So BOSS could be very low capacity after servicing for long time. This is a pontential issue and need further investigation.
+
=== Issues Found Summarized===
=== Issues Found Summarized===
-
* Observed losing several workflows(19995 for 20000 workflows; 9999 for 10000 workflows)
+
* Bug: https://projects.maemo.org/bugzilla/show_bug.cgi?id=197739
-
* Engine crashed in some cases:
+
  This bug fix the channel closing issue, and solved following issues found before(http://wiki.meego.com/User_talk:Pennymax):
-
** Case for 100k workflows launched on HW: finished around 60k workflows then engine crashed
+
  * Crash
-
** Case for 30k workflows launched on Virtual Machine(1 CPU, 512M): finished around 20k workflows then engine crashed
+
  * Workflow losing
 +
  * Performance decreasing after running a while
* Memory leaking: memory used by engine was keeping even all workflows finished
* Memory leaking: memory used by engine was keeping even all workflows finished
-
* Response time keeps increasing while more and more workflows launched
 
=== TODO ===
=== TODO ===

Revision as of 09:09, 18 October 2010

This page is to talk about the way of BOSS performance testing.

Contents

Rate

  • "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.

Test Method

  • The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) for on shot or iteratively to BOSS then record the corresponding start and end time. Then it's easy to get the "Rate" for that test case.

Test Environment

Test Scripts

We created the test suit(http://gitorious.org/boss-performance-test) including:

  • A client to send multiple workflows to BOSS(or to say "launch workflows") at once time.
  • A logger plug-in of engine, record interesting engine internal messages to somewhere which will be used to calculate time durations of workflow, participant, etc.
  • An utility to analyze the time data. Here we select the earliest start time and latest end time of all workflows to calculate the whole duration.

Test Cases

  • Test Case Set 1

In this case set, we executed following cases:

* Request 5000 workflows repeatly(send another 5k workflows to engine after previous 5000 finished); 
  Repeat 20 times(100k workflows will be handled totally)

Boss performance test 5kx20.PNG

 * Request 1000 workflows repeatly(send another 1k workflows to engine after previous 1000 finished); 
   Repeat 100 times(100k workflows will be handled totally)

Boss performance test 1kx100.PNG

From above results, we can find some interesting things:

* Rates are almost NOT decreasing(for 1k or 5k, they are both around "25")
* CPU/DISK is at high rate during testing
* Memory is still taken after workflow finished. It looks like "memory leak" but it may not be if considering the "warm cache(or GC)" in Ruby/Python like language
* NO workflow losing
* NO engine crashing 

Issues Found Summarized

 This bug fix the channel closing issue, and solved following issues found before(http://wiki.meego.com/User_talk:Pennymax):
  * Crash
  * Workflow losing
  * Performance decreasing after running a while
  • Memory leaking: memory used by engine was keeping even all workflows finished

TODO

  • Distributed workers
  • Other storage
  • Think about issues found
Personal tools