This page is to talk about the way of BOSS performance testing.
Rate
- "Rate" is to measure the throughput of BOSS - how many workflows can be handled in one second. With the "Rate" of different workflow request scales, we will have a view of the BOSS capability for different service levels.
Test Method
- The way to get "Rate" is straightforward. In each test case, sending specified number of workflows(such 3000, 20000...) for on shot or iteratively to BOSS then record the corresponding start and end time. Then it's easy to get the "Rate" for that test case.
Test Environment
Test Scripts
We created the test suit(http://gitorious.org/boss-performance-test) including:
- A client to send multiple workflows to BOSS(or to say "launch workflows") at once time.
- A logger plug-in of engine, record interesting engine internal messages to somewhere which will be used to calculate time durations of workflow, participant, etc.
- An utility to analyze the time data. Here we select the earliest start time and latest end time of all workflows to calculate the whole duration.
Test Cases
In this case set, we executed following cases:
* Request 5000 workflows repeatly(send another 5k workflows to engine after previous 5000 finished);
Repeat 20 times(100k workflows will be handled totally)
* Request 1000 workflows repeatly(send another 1k workflows to engine after previous 1000 finished);
Repeat 100 times(100k workflows will be handled totally)
From above results, we can find some interesting things:
* Rates are almost NOT decreasing(for 1k or 5k, they are both around "25")
* CPU/DISK is at high rate during testing
* Memory is still taken after workflow finished. It looks like "memory leak" but it may not be if considering the "warm cache(or GC)" in Ruby/Python like language
* NO workflow losing
* NO engine crashing
Issues Found Summarized
This bug fix the channel closing issue, and solved following issues found before(http://wiki.meego.com/User_talk:Pennymax):
* Crash
* Workflow losing
* Performance decreasing after running a while
- Memory leaking: memory used by engine was keeping even all workflows finished
TODO
- Distributed workers
- Other storage
- Think about issues found