Meego Wiki
Views

Release Infrastructure/BOSS/Performance/Results

From MeeGo wiki
< Release Infrastructure | BOSS | Performance(Difference between revisions)
Jump to: navigation, search
 
(7 intermediate revisions not shown)
Line 1: Line 1:
This page is to talk about the way of BOSS performance testing.
This page is to talk about the way of BOSS performance testing.
-
=== Rate ===
+
=== Concepts ===
-
* "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.
+
* "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.
 +
* "Load": how many requests(workflows) sending to engine at same time
 +
* "Iteration": one iteration begins at sending specific number of load workflows to engine, ends at engine finishing all received workflows   
=== Test Method ===
=== Test Method ===
-
* The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) to BOSS then record the start and end time. Then it's easy to get the "Rate" for that test case.
+
* The basic method in this test project is to simulate boss using in real world - running as a service for long time and dealing with multiple requests from multiple users continually; Then observe the performance data to get the evaluation
 +
* The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) for on shot or iteratively to BOSS then record the corresponding start and end time. Then it's easy to get the "Rate" for that test case.
=== Test Environment ===
=== Test Environment ===
* Hardware: CPU E5520(x16 cores), RAM 16G, openSUSE 11.2
* Hardware: CPU E5520(x16 cores), RAM 16G, openSUSE 11.2
-
* Virtual Machine: CPU E8400(x1 core), RAM 512M, openSUSE 11.2
+
* BOSS packages and all its dependency taken from OBS(https://build.opensuse.org/project/show?project=Maemo%3AMeeGo-Infra%3ABOSS)
-
* BOSS packages taken from OBS(https://build.opensuse.org/project/show?project=Maemo%3AMeeGo-Infra%3ABOSS)
+
* A patch to fix crash issue caused by AMQP channel creating bug(https://github.com/kennethkalmer/ruote-amqp/issues/issue/3/)
 +
* A patch to fix memory leak issue in "yajl-ruby" library(https://github.com/brianmario/yajl-ruby/issues#issue/36)
=== Test Scripts ===
=== Test Scripts ===
-
We created the test suit(http://gitorious.org/boss-performance-test) including:
+
* Test suite code can be found from project "boss-performance-test"(http://meego.gitorious.org/meego-infrastructure-tools/boss-performance-test), which including:
-
* A client to send multiple workflows to BOSS(or to say "launch workflows") at once time.
+
** Config files to help set up testing environment
-
* A logger plug-in of engine, record interesting engine internal messages to somewhere which will be used to calculate time durations of workflow, participant, etc.
+
** Scripts to simulate BOSS using in real world
-
* An utility to analyze the time data. Here we select the earliest start time and latest end time of all workflows to calculate the whole duration.
+
** Utilities to help analyzing various test results
 +
* A "lite" version is also available on the branch "New":
 +
** It uses your current BOSS environment directly rather than starting another new BOSS instance
-
=== Test Cases ===
+
=== Test Cases And Test Results===
-
* Test Case Set 1
+
'''* Test case: Compare performance using single worker and multiple workers'''
-
In this case set, we test the rates for launching different workflow scales(for N=[several hundreds, ..., several thousands]). Here we also compared the situation of different number of workers. And the CPU/MEM/DISK load is record.
+
** Multiple workers run on the same host
-
* Raw data and graph
+
** Get rate by testing following config then calculate the average rate
-
[[File:boss_performance_test_0920.PNG]]
+
*** FS storage
-
* CPU/MEM/DISK load for 10k and 20k cases
+
*** load: 1k
-
[[File:load_10k.PNG]]
+
*** iteration: 2
-
[[File:load_20k.PNG]]
+
** Test results:
-
* Test Case Set 2
+
*** [[File:Multiworkers.PNG]]
-
Here we got the "reponse time" testing results. The way is to launch 1k workflows each time after previous 1k workflows finished. Observing the durations for executing each 1k workflows we can get the response time trend. Following are results on VM and HW 
+
** Conclusion
-
[[File:Response_1k.PNG]]
+
*** Running multiple workers on same host will not increase the performance
-
=== Issues Found ===
+
'''* Test case: Compare performance using different load'''
-
* Observed losting several workflows(19995 for 20000 workflows; 9999 for 10000 workflows)
+
** Get rate by testing following config then calculate the average rate
-
* Engine crashed in some cases:
+
*** FS storage
-
** Case for 100k workflows launched on HW: finished around 60k workflows then engine crashed
+
*** 1 worker
-
** Case for 30k workflows launched on Virtual Machine(1 CPU, 512M): finished around 20k workflows then engine crashed
+
*** load: 300, 500, 1k, 3k, 5k, 8k, 10k, 20k, 30k, 50k
-
* Memory leaking: memory used by engine was keeping even all workflows finished
+
*** iteration: 1 for each load
 +
** Test results:
 +
*** [[File:load_pressure.PNG]]
 +
** Conclusion
 +
*** Performance is decreasing while load increasing 
-
=== TODO ===
+
'''* Test case: Observe the performance for long time running'''
-
* think about why we lost workflows in some cases
+
** Get rate by testing following config then calculate the average rate
-
* different storage
+
*** FS storage
 +
*** 1 worker
 +
*** load: 1k
 +
*** iteration: 7000(running for three days, about 7 million workflows totally))
 +
** Test results:
 +
*** [[File:1k_infinite_fix_leak.PNG]]
 +
*** [[Media:1k_infinite.PNG|Previous Results]]
-
=== Some Thoughts ===
+
** Conclusion
-
* Using more workers can boost performance?
+
*** Performance was keeping stable - much better than before
-
* Using multiple engines can boost performance?
+
*** CPU and DISK is almost occupied all the time - it's normal as expectation
 +
*** Memory increased about 20M - it's still a bit of memory leak but much much better than before(will get >1G in same situation)
 +
*** No crash and No workflow lose - good!

Latest revision as of 08:05, 6 December 2010

This page is to talk about the way of BOSS performance testing.

Contents

Concepts

  • "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.
  • "Load": how many requests(workflows) sending to engine at same time
  • "Iteration": one iteration begins at sending specific number of load workflows to engine, ends at engine finishing all received workflows

Test Method

  • The basic method in this test project is to simulate boss using in real world - running as a service for long time and dealing with multiple requests from multiple users continually; Then observe the performance data to get the evaluation
  • The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) for on shot or iteratively to BOSS then record the corresponding start and end time. Then it's easy to get the "Rate" for that test case.

Test Environment

Test Scripts

  • Test suite code can be found from project "boss-performance-test"(http://meego.gitorious.org/meego-infrastructure-tools/boss-performance-test), which including:
    • Config files to help set up testing environment
    • Scripts to simulate BOSS using in real world
    • Utilities to help analyzing various test results
  • A "lite" version is also available on the branch "New":
    • It uses your current BOSS environment directly rather than starting another new BOSS instance

Test Cases And Test Results

* Test case: Compare performance using single worker and multiple workers

    • Multiple workers run on the same host
    • Get rate by testing following config then calculate the average rate
      • FS storage
      • load: 1k
      • iteration: 2
    • Test results:
      • Multiworkers.PNG
    • Conclusion
      • Running multiple workers on same host will not increase the performance

* Test case: Compare performance using different load

    • Get rate by testing following config then calculate the average rate
      • FS storage
      • 1 worker
      • load: 300, 500, 1k, 3k, 5k, 8k, 10k, 20k, 30k, 50k
      • iteration: 1 for each load
    • Test results:
      • Load pressure.PNG
    • Conclusion
      • Performance is decreasing while load increasing

* Test case: Observe the performance for long time running

    • Get rate by testing following config then calculate the average rate
      • FS storage
      • 1 worker
      • load: 1k
      • iteration: 7000(running for three days, about 7 million workflows totally))
    • Test results:
    • Conclusion
      • Performance was keeping stable - much better than before
      • CPU and DISK is almost occupied all the time - it's normal as expectation
      • Memory increased about 20M - it's still a bit of memory leak but much much better than before(will get >1G in same situation)
      • No crash and No workflow lose - good!
Personal tools