Meego Wiki
Views

Release Infrastructure/BOSS/Performance/Results

From MeeGo wiki
< Release Infrastructure | BOSS | Performance(Difference between revisions)
Jump to: navigation, search
 
(9 intermediate revisions not shown)
Line 1: Line 1:
This page is to talk about the way of BOSS performance testing.
This page is to talk about the way of BOSS performance testing.
-
=== Rate ===
+
=== Concepts ===
-
* "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.
+
* "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.
 +
* "Load": how many requests(workflows) sending to engine at same time
 +
* "Iteration": one iteration begins at sending specific number of load workflows to engine, ends at engine finishing all received workflows   
=== Test Method ===
=== Test Method ===
-
* The way to get "Rate" is straightforward. For each test case, just send specified number of workflows(such 3000, 20000...) to BOSS then record the start and end time. Then it's easy to get the "Rate" for that test case.
+
* The basic method in this test project is to simulate boss using in real world - running as a service for long time and dealing with multiple requests from multiple users continually; Then observe the performance data to get the evaluation
 +
* The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) for on shot or iteratively to BOSS then record the corresponding start and end time. Then it's easy to get the "Rate" for that test case.
-
=== Test Suit ===
+
=== Test Environment ===
-
We created the test suit(http://gitorious.org/boss-performance-test) including: 
+
* Hardware: CPU E5520(x16 cores), RAM 16G, openSUSE 11.2
-
* A client to send multiple workflows to BOSS(or to say "launch workflows") at once time.
+
* BOSS packages and all its dependency taken from OBS(https://build.opensuse.org/project/show?project=Maemo%3AMeeGo-Infra%3ABOSS)
-
* A logger plug-in of engine, record interesting engine internal messages to somewhere which will be used to calculate time durations of workflow, participant, etc.
+
* A patch to fix crash issue caused by AMQP channel creating bug(https://github.com/kennethkalmer/ruote-amqp/issues/issue/3/)
-
* An utility to analyze the time data. Here we select the earliest start time and latest end time of all workflows to calculate the whole duration.
+
* A patch to fix memory leak issue in "yajl-ruby" library(https://github.com/brianmario/yajl-ruby/issues#issue/36)
-
=== Test Results ===
+
=== Test Scripts ===
-
* Here we executed some cases on our marchine(CPU E5520(x16 cores), RAM 16G) and on a virtual machine(CPU E8400(x1 core), RAM 512M) for different workflow scales.
+
* Test suite code can be found from project "boss-performance-test"(http://meego.gitorious.org/meego-infrastructure-tools/boss-performance-test), which including:
-
** 1 worker on HW
+
** Config files to help set up testing environment
-
** 1 worker and 2 workers on VM
+
** Scripts to simulate BOSS using in real world
-
[[File:boss_performance_test_0920.PNG]]
+
** Utilities to help analyzing various test results
-
** CPU/MEM/DISK load for 10k and 20k launching
+
* A "lite" version is also available on the branch "New":
-
[[File:load_10k.PNG]]
+
** It uses your current BOSS environment directly rather than starting another new BOSS instance
-
[[File:load_20k.PNG]]
+
-
* Here we got the "reponse time" testing results. The way is that Launching 1k workflows each time after previous 1k workflows finished, observing the durations for executing each 1k workflows. 
+
-
[[File:Response_1k.PNG]]
+
-
=== Issues Found ===
+
=== Test Cases And Test Results===
-
* Observed losting several workflows(19995 for 20000 workflows; 9999 for 10000 workflows)
+
'''* Test case: Compare performance using single worker and multiple workers'''
-
* Engine crashed in some cases:
+
** Multiple workers run on the same host
-
** Case for 100k workflows launched on HW: finished around 60k workflows then engine crashed
+
** Get rate by testing following config then calculate the average rate
-
** Case for 30k workflows launched on Virtual Machine(1 CPU, 512M): finished around 20k workflows then engine crashed
+
*** FS storage
-
* Memory leaking: memory used by engine was keeping even all workflows finished
+
*** load: 1k
 +
*** iteration: 2
 +
** Test results:
 +
*** [[File:Multiworkers.PNG]]
 +
** Conclusion
 +
*** Running multiple workers on same host will not increase the performance
-
=== TODO ===
+
'''* Test case: Compare performance using different load'''
-
* finish test on virtual machine
+
** Get rate by testing following config then calculate the average rate
-
* test such case: "after you launch 50000 processes, how long does the next take to process? Is it as quick as the first or is it as slow as the 50000'th"
+
*** FS storage
-
* involve mutilple participant
+
*** 1 worker
-
* use atop or other tool to track CPU, memory, disk load... info
+
*** load: 300, 500, 1k, 3k, 5k, 8k, 10k, 20k, 30k, 50k
-
* track duration between client send requests and engine start to deal first one
+
*** iteration: 1 for each load
-
* think about why we lost workflows in some cases
+
** Test results:
-
* different storage
+
*** [[File:load_pressure.PNG]]
-
* multiple workers
+
** Conclusion
 +
*** Performance is decreasing while load increasing 
-
=== Some Thoughts ===
+
'''* Test case: Observe the performance for long time running'''
-
* Storage maybe the bottleneck....need more testing
+
** Get rate by testing following config then calculate the average rate
-
* Using more workers can boost performance? maybe, but verify above thought firstly
+
*** FS storage
-
* Is possible to hack scheduler in Ruote::Worker to make things better?
+
*** 1 worker
 +
*** load: 1k
 +
*** iteration: 7000(running for three days, about 7 million workflows totally))
 +
** Test results:
 +
*** [[File:1k_infinite_fix_leak.PNG]]
 +
*** [[Media:1k_infinite.PNG|Previous Results]]
 +
 
 +
** Conclusion
 +
*** Performance was keeping stable - much better than before
 +
*** CPU and DISK is almost occupied all the time - it's normal as expectation
 +
*** Memory increased about 20M - it's still a bit of memory leak but much much better than before(will get >1G in same situation)
 +
*** No crash and No workflow lose - good!

Latest revision as of 08:05, 6 December 2010

This page is to talk about the way of BOSS performance testing.

Contents

Concepts

  • "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.
  • "Load": how many requests(workflows) sending to engine at same time
  • "Iteration": one iteration begins at sending specific number of load workflows to engine, ends at engine finishing all received workflows

Test Method

  • The basic method in this test project is to simulate boss using in real world - running as a service for long time and dealing with multiple requests from multiple users continually; Then observe the performance data to get the evaluation
  • The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) for on shot or iteratively to BOSS then record the corresponding start and end time. Then it's easy to get the "Rate" for that test case.

Test Environment

Test Scripts

  • Test suite code can be found from project "boss-performance-test"(http://meego.gitorious.org/meego-infrastructure-tools/boss-performance-test), which including:
    • Config files to help set up testing environment
    • Scripts to simulate BOSS using in real world
    • Utilities to help analyzing various test results
  • A "lite" version is also available on the branch "New":
    • It uses your current BOSS environment directly rather than starting another new BOSS instance

Test Cases And Test Results

* Test case: Compare performance using single worker and multiple workers

    • Multiple workers run on the same host
    • Get rate by testing following config then calculate the average rate
      • FS storage
      • load: 1k
      • iteration: 2
    • Test results:
      • Multiworkers.PNG
    • Conclusion
      • Running multiple workers on same host will not increase the performance

* Test case: Compare performance using different load

    • Get rate by testing following config then calculate the average rate
      • FS storage
      • 1 worker
      • load: 300, 500, 1k, 3k, 5k, 8k, 10k, 20k, 30k, 50k
      • iteration: 1 for each load
    • Test results:
      • Load pressure.PNG
    • Conclusion
      • Performance is decreasing while load increasing

* Test case: Observe the performance for long time running

    • Get rate by testing following config then calculate the average rate
      • FS storage
      • 1 worker
      • load: 1k
      • iteration: 7000(running for three days, about 7 million workflows totally))
    • Test results:
    • Conclusion
      • Performance was keeping stable - much better than before
      • CPU and DISK is almost occupied all the time - it's normal as expectation
      • Memory increased about 20M - it's still a bit of memory leak but much much better than before(will get >1G in same situation)
      • No crash and No workflow lose - good!
Personal tools