Qumranet made a big splash at BriForum last month. Those of you who were there saw a demo of their "spice" remote display protocol, which showed a four-monitor Windows XP desktop running remotely with full motion high def video, skype, IE... the works. Spice is one of the most amazing things I've seen in this industry in a long time. It's 100% software-based and available now as part of Qumranet's "Solid ICE" VDI product.
If you've never seen spice, here's a short little video I shot of it when I visited Qumranet's office a few months ago:
Make no mistake: Spice takes bandwidth. A lot of bandwidth. Just how much bandwidth depends on how many screens you have and what you're doing. But obviously if you're watching full-motion high-def video that in it's compressed codec state is a few gigabytes in size, this isn't quite going to work over a 20kpbs connection. Office apps could easily take "normal" amounts of bandwidth, but with four monitors, high-dev video, skype, and the works, we could easily get spice up over 100mpbs.
(Perhaps this is a conversation for another day, but personally I'm fine with this. Spice is a LAN solution for environments that need more than what ICA or RDP can do. And if you need this, then you understand that bandwidth is important too. Of course if you go down this route, you probably already have 100 meg or gigabit switched ethernet to your desktops. And finally, yes, spice really is a LAN-only protocol. In fact if you make a Solid ICE connection over a WAN, Qumranet thinks, "well, since you have a WAN, you are already valuing remote access more than performance," and they just drop down to using vanilla RDP.)
But the point is this article is not about spice. The point is that in addition to using the spice protocol, Qumranet's Solid ICE product uses that "KVM" hypervisor instead of Xen or VMware or Hyper-V. (Remember last week's little conversation about KVM?) Qumranet feels that KVM is a better hypervisor for VDI environments than anything else on the market, and they've run some tests that they feel prove it.
The problem is that as a vendor, people won't really trust Qumranet's benchmarking results since obviously Qumranet would have a lot to gain if KVM is more efficient that something else.
Therefore, Qumranet has hired Gabe and me to conduct performance tests of their product and to compare it to other leading VDI products. Gabe and I will be doing that work next week, and we're spending this week putting together our test scripts and plans.
We've published performance results in the past, and inevitably someone posts a comment like "Your results are crap because you didn't do xxxx." Or "why didn't you configure xxx option."
So this time, we're turning this model around. We're inviting the entire community to review our plans, and we hope to address any perceived problems ahead of time.
Background information about this performance test project
First, just to make sure there is no gray area, Qumranet is paying us to conduct this test and to publish the results.
Second, we are going to test Qumranet's Solid ICE product, Citrix XenDesktop (using XenServer as the VDI host), and VMware's VDI solution.
Since Qumranet is paying for the test, they will obviously provide a engineer or two for us during the testing who can answer any questions that might come up. But again, we want to make this test as fair as possible, so we contacted Citrix and VMware and let them know what we're up to. Both Citrix and VMware have agreed to make engineers available to us during the testing as well.
Also, many of you are probably aware that VMware's EULA does not permit public disclosure of benchmark or performance testing without prior approval. Part of me wants to say "F You" to that and publish our results anyway, but part of me wants to do the "right" thing and try to get pre-approval from VMware.
I talked to my contacts over there, and it turns out that this whole pre-approval thing is pretty easy. Basically we told them what we were doing and why, talked about our scripts and our methods and stuff, and agreed to provide them with the full results, and they were cool with it.
And for us, this is something we wanted to do anyway, since we want Citrix, VMware, and the whole community to view our tests as "valid."
That said, let's take a look at what we're planning on doing.
Our testing methodology
For this project, we're testing the efficiency of the hypervisor when it runs VDI loads. VMware has their VMmark standard benchmark test suite, but unfortunately that is for server workloads only and does not include VDI use cases. We'd also love to use Login Consultants' LoginVSI suite, but that's still in beta and only available for TS environments currently. (Plus there is some work they need to do on randomization which I'll talk about later.)
The bottom line is that we're pretty much on our own so far as building the test environment.
Our fundamental idea is that we'll do this in a way that's more-or-less similar to the way we do terminal server tests. We'll write a AutoIT script, run it in a whole bunch of Windows XP VMs, and then just see how many VMs we can throw on a box before the performance gets too bad.
The only real "catch" here is that we really want to simulate the "randomness" of real-world desktop users. Today's hypervisors do a really great job of caching and memory sharing and all kinds of things, so if you have 50 users in 50 VMs all running the exact same script, your lab tests will show user densities a lot higher than what you can get in the real world. So we want to write scripts that have different users doing different things.
We feel the easiest way to do this is to create small activity "modules" which we can re-user and re-combine to create our user scripts. We want to create maybe 50 or 100 different modules. Some modules will be simple, like opening notepad, jotting down some notes, saving the file, and closing notepad. Some will be more complex.. loading large word docs, find and replace, spell check, embed OLE Excel chart, etc.
Then once we have all of our modules built, we can just run them all in random order in each VM.
Now of course we need to be able to run the exact same tests on all three platforms, so we can't randomize at run time. Instead, we need to "pre-randomize" our scripts so that we can run the same script modules in the same order every time we run the test. Ultimately we're planning on creating a script for each user with a random selection of modules. Then we can run these scripts in the various VMs and get our random experience, but still have the same randomness from test-to-test.
We're also planning on randomizing the reaction speed of the users. AutoIT allows us to set global variables that specify how fast a user performs their actions after a screen pops up. We'll randomly configure each script for something in the 1/4 second to 3 second range.
As our scripts run, we'll have them dump out the run times of each module to a CSV log file. When we're done we should have a huge log file with thousands and thousands of module run times (as well as the corresponding data like what the user's reaction time was as well as how many VMs were running on that box.)
What we will make public
Since Qumranet is paying for this project, they want us to summarize our results and write a few pages and make some nice charts about our findings. However, we think there is a lot of value for the community at large, so we are also committed to releasing all of our test scripts and raw data, including:
- All the individual activity modules
- Our "script to create the scripts" that will create our 100 (or whatever) "pre-randomized" scripts
- The 100 individual user scripts
- A complete description of our Windows XP setup so that you can get these scripts running in your environment
- A full dump (Excel, CSV, SQL? dunno yet) of our raw results, with every module for every user and the timing for all three platforms.
We understand that we're not trying to create a new industry benchmark. Instead, we want to say "Here are some tests that we've done. Here's how we did them. Here are the results. Here's what you need to know to run similar tests yourself."
A few other notes about this Qumranet test
There are a few other specifics of this project that are probably worth talking about.
First, we're going to run these tests on "normal" hardware. Probably something like an 8-core server with 16GB RAM. This test is not about building the biggest server in the world to server the most VMs. It's about seeing how the various hypervisors perform on normal servers with real-world workloads.
Second, we're testing the hypervisors, not the protocols. A complete "solution" test would require analyzing the performance of the screens across the network and all sorts of things. In our case, we're just going to run the tests and time them from the server. (By the way, we're not even really concerned with CPU or memory utilization. We just want to run these modules and see how the overall "slowness" of the server is affected.)
Questions we have
So now that you understand our plan, what do you think? Specifically,
- Do you think these tests are "valid?" If not, why not? What should we do differently?
- What "activity modules" should we build? What apps should we install / simulate?
- Is there anything special that we should do when we install each of the three products?
- Is there anything else we're not thinking about?
Since we're running these tests next week, we really hope to be able to get the results published the week after that. I'm sure that's where the "real" comments will be made, but I'd like to "pre-address" anyone's concerns now.