Meego Wiki
Views

Release Infrastructure/BOSS/Performance/Results

From MeeGo wiki
< Release Infrastructure | BOSS | Performance(Difference between revisions)
Jump to: navigation, search
 
(5 intermediate revisions not shown)
Line 1: Line 1:
This page is to talk about the way of BOSS performance testing.
This page is to talk about the way of BOSS performance testing.
-
=== Rate ===
+
=== Concepts ===
-
* "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.
+
* "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.
 +
* "Load": how many requests(workflows) sending to engine at same time
 +
* "Iteration": one iteration begins at sending specific number of load workflows to engine, ends at engine finishing all received workflows   
=== Test Method ===
=== Test Method ===
 +
* The basic method in this test project is to simulate boss using in real world - running as a service for long time and dealing with multiple requests from multiple users continually; Then observe the performance data to get the evaluation
* The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) for on shot or iteratively to BOSS then record the corresponding start and end time. Then it's easy to get the "Rate" for that test case.
* The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) for on shot or iteratively to BOSS then record the corresponding start and end time. Then it's easy to get the "Rate" for that test case.
=== Test Environment ===
=== Test Environment ===
* Hardware: CPU E5520(x16 cores), RAM 16G, openSUSE 11.2
* Hardware: CPU E5520(x16 cores), RAM 16G, openSUSE 11.2
-
* Virtual Machine: CPU E8400(x1 core), RAM 512M, openSUSE 11.2
+
* BOSS packages and all its dependency taken from OBS(https://build.opensuse.org/project/show?project=Maemo%3AMeeGo-Infra%3ABOSS)
-
* BOSS packages taken from OBS(https://build.opensuse.org/project/show?project=Maemo%3AMeeGo-Infra%3ABOSS)
+
* A patch to fix crash issue caused by AMQP channel creating bug(https://github.com/kennethkalmer/ruote-amqp/issues/issue/3/)
-
* BOSS configuration: FS storage, single engine, one or two workers, AMQP participants. All components are running in same one machine
+
* A patch to fix memory leak issue in "yajl-ruby" library(https://github.com/brianmario/yajl-ruby/issues#issue/36)
=== Test Scripts ===
=== Test Scripts ===
-
We created the test suit(http://gitorious.org/boss-performance-test) including:
+
* Test suite code can be found from project "boss-performance-test"(http://meego.gitorious.org/meego-infrastructure-tools/boss-performance-test), which including:
-
* A client to send multiple workflows to BOSS(or to say "launch workflows") at once time.
+
** Config files to help set up testing environment
-
* A logger plug-in of engine, record interesting engine internal messages to somewhere which will be used to calculate time durations of workflow, participant, etc.
+
** Scripts to simulate BOSS using in real world
-
* An utility to analyze the time data. Here we select the earliest start time and latest end time of all workflows to calculate the whole duration.
+
** Utilities to help analyzing various test results
 +
* A "lite" version is also available on the branch "New":
 +
** It uses your current BOSS environment directly rather than starting another new BOSS instance
-
=== Test Cases ===
+
=== Test Cases And Test Results===
-
* Test Case Set 1
+
'''* Test case: Compare performance using single worker and multiple workers'''
-
In this case set, we executed following cases:
+
** Multiple workers run on the same host
-
* Request 5000 workflows repeatly(send another 5k workflows to engine after previous 5000 finished);
+
** Get rate by testing following config then calculate the average rate
-
  Repeat 20 times(100k workflows will be handled totally)
+
*** FS storage
-
[[File:boss_performance_test_5kx20.PNG]]
+
*** load: 1k
-
  * Request 1000 workflows repeatly(send another 1k workflows to engine after previous 1000 finished);
+
*** iteration: 2
-
    Repeat 100 times(100k workflows will be handled totally)
+
** Test results:
-
[[File:boss_performance_test_1kx100.PNG]]
+
*** [[File:Multiworkers.PNG]]
 +
** Conclusion
 +
*** Running multiple workers on same host will not increase the performance
-
From above results, we can find some interesting things:
+
'''* Test case: Compare performance using different load'''
-
* Rates are almost NOT decreasing(for 1k or 5k, they are both around "25")
+
** Get rate by testing following config then calculate the average rate
-
* CPU/DISK is at high rate during testing
+
*** FS storage
-
* Memory is still taken after workflow finished. It looks like "memory leak" but it may not be if considering the "warm cache(or GC)" in Ruby/Python like language
+
*** 1 worker
-
* NO workflow losing
+
*** load: 300, 500, 1k, 3k, 5k, 8k, 10k, 20k, 30k, 50k
-
* NO engine crashing
+
*** iteration: 1 for each load
 +
** Test results:
 +
*** [[File:load_pressure.PNG]]
 +
** Conclusion
 +
*** Performance is decreasing while load increasing 
-
=== Issues Found Summarized===
+
'''* Test case: Observe the performance for long time running'''
-
* Bug: https://projects.maemo.org/bugzilla/show_bug.cgi?id=197739
+
** Get rate by testing following config then calculate the average rate
-
  This bug fix the channel closing issue, and solved following issues found before(http://wiki.meego.com/User_talk:Pennymax):
+
*** FS storage
-
  * Crash
+
*** 1 worker
-
  * Workflow losing
+
*** load: 1k
-
  * Performance decreasing after running a while
+
*** iteration: 7000(running for three days, about 7 million workflows totally))
-
* Memory leaking: memory used by engine was keeping even all workflows finished
+
** Test results:
 +
*** [[File:1k_infinite_fix_leak.PNG]]
 +
*** [[Media:1k_infinite.PNG|Previous Results]]
-
=== TODO ===
+
** Conclusion
-
* Distributed workers
+
*** Performance was keeping stable - much better than before
-
* Other storage
+
*** CPU and DISK is almost occupied all the time - it's normal as expectation
-
* Think about issues found
+
*** Memory increased about 20M - it's still a bit of memory leak but much much better than before(will get >1G in same situation)
 +
*** No crash and No workflow lose - good!

Latest revision as of 08:05, 6 December 2010

This page is to talk about the way of BOSS performance testing.

Contents

Concepts

  • "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.
  • "Load": how many requests(workflows) sending to engine at same time
  • "Iteration": one iteration begins at sending specific number of load workflows to engine, ends at engine finishing all received workflows

Test Method

  • The basic method in this test project is to simulate boss using in real world - running as a service for long time and dealing with multiple requests from multiple users continually; Then observe the performance data to get the evaluation
  • The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) for on shot or iteratively to BOSS then record the corresponding start and end time. Then it's easy to get the "Rate" for that test case.

Test Environment

Test Scripts

  • Test suite code can be found from project "boss-performance-test"(http://meego.gitorious.org/meego-infrastructure-tools/boss-performance-test), which including:
    • Config files to help set up testing environment
    • Scripts to simulate BOSS using in real world
    • Utilities to help analyzing various test results
  • A "lite" version is also available on the branch "New":
    • It uses your current BOSS environment directly rather than starting another new BOSS instance

Test Cases And Test Results

* Test case: Compare performance using single worker and multiple workers

    • Multiple workers run on the same host
    • Get rate by testing following config then calculate the average rate
      • FS storage
      • load: 1k
      • iteration: 2
    • Test results:
      • Multiworkers.PNG
    • Conclusion
      • Running multiple workers on same host will not increase the performance

* Test case: Compare performance using different load

    • Get rate by testing following config then calculate the average rate
      • FS storage
      • 1 worker
      • load: 300, 500, 1k, 3k, 5k, 8k, 10k, 20k, 30k, 50k
      • iteration: 1 for each load
    • Test results:
      • Load pressure.PNG
    • Conclusion
      • Performance is decreasing while load increasing

* Test case: Observe the performance for long time running

    • Get rate by testing following config then calculate the average rate
      • FS storage
      • 1 worker
      • load: 1k
      • iteration: 7000(running for three days, about 7 million workflows totally))
    • Test results:
    • Conclusion
      • Performance was keeping stable - much better than before
      • CPU and DISK is almost occupied all the time - it's normal as expectation
      • Memory increased about 20M - it's still a bit of memory leak but much much better than before(will get >1G in same situation)
      • No crash and No workflow lose - good!
Personal tools