Gabe and I are in Sunnyvale, Calif., this week working with Qumranet. Last week we wrote a blog post about our project describing our plans and asking the community for input as to how we should conduct our testing. (The short version is that we are evaluating Qumranet, Citrix, and VMware in the VDI space, and we had some questions about the best way to conduct our tests. Please read that original article before you read this article.)
Yesterday was the first day of our project, and today's article shares the final test plan that we put together (based on everyone's feedback) as well as an update on yesterday's progress.
Our (revised) testing methodology
Last week we wrote that our tests would focus on stressing the three vendors' respective hypervisors: KVM for Qumranet, Xen for Citrix, and ESX for VMware. Since we made that original plan, we've talked to several more people and received a LOT of feedback and comments on the article. After sleeping on this for a few days, we decided (or agreed with the people who said) that there really isn't that much value in testing just the hypervisor.
The problem that many people pointed out in the comments (and many people we talked to told us) is that it's easy to make a hypervisor that can crank through these AutoIT scripts as fast as possible, but what does that actually mean for end-user experience? Who cares if a script can run 10% faster if the VM host is ignoring clients and not sending screenshots down?
Therefore we decided that we'll look at several components of the VDI solutions, including:
- The density of users / VMs on a hypervisor running a desktop workload
- The thin provisioning process, performance, and management
- The remote display protocol performance
The user density thing is what we were originallly planning to test, so nothing new there. The thin provisioning process is interesting, because a lot of people asked us whether we would evaluate Citrix Provisioning Server this week. Qumranet actually has thin provisioning themselves (where many users can share the same master image, but each with their own diffs), so this would be a great test to do. And then of course there's Qumranet's spice protocol, which as we wrote last week can support multiple displays with near-perfect quality, even with stuff like Skype and multimedia.
So this new plan is exciting for us because we'll be testing three components of the solution instead of just one. Let's look a bit more at how we'll run each of these tests.
The user density / hypervisor test
Last week we'd planned on writing our own AutoIT-based scripts to simulate office workers. Since then, someone pointed out that the Business Application Performance Corporation (Bapco) has an industry standard benchmark called SysMark that does almost this exact same thing. (They even have an "Office Productivity" benchmark.) So rather than try to write all of our own scripts, we decided that we'd just use the industry standard benchmark.
We can launch SysMark from the command-line in each VM. We're planning to just loop the SysMark tests and add more and more VMs to the test. Then about every five VMs or so, we'll run a single SysMark test that we'll get certified with Bapco. We're also planning to video record this process, so that we can publish videos that show things like "here's the experience with a SysMark score of 100. And here's the experience with 120. Etc.
We also decided that we're not going to include multimedia in these hypervisor loading tests. (We still plan on evaluating multimedia, just not as part of this test set.) The reasons for this decision are twofold. First, no one can agree on what percentage of users should run multimedia in our tests. Some want 50%. Some want 20%. Some say that at any given time, it won't be any more than 1 or 2%. So we decided that we'd evaulate the experience of different works with office apps as our baseline, and then we'd see how multimedia affects that later.
The second reason we decided not to do multimedia at the same time as the hypervisor loading is because different vendors accelerate and enhance multimedia in different ways. For example, VMware uses a multimedia redirection technology where they just send the original media file to the client where it's rendered locally. The problem is that you actually have to have a client device connected to the VM in order for this to work. If you just fire up a VM and run a script with no client, the host VM won't be able to leverage the multimedia redirection technology which means it will be forced to render the media stream within the host VM. This would artificially lower the performance or number of VMs we could get on a server.
So if we connect clients to the hosts for our hypervisor testing, can those clients be VMs? If so, do they have to have real screens displaying, or can then run headless? This issue becomes really complex really fast. These are all great questions that I hope someone can answer, but we simply don't have the time in our one-week evaluation to explore all of these options.
Therefore for hypervisor loading, we're going to focus on office apps running in a headless way. (Again, keep reading to learn about how we're going to test multimedia.)
Citrix's Provisioning Server has been the leading product in this space for years, although as I said Qumranet does thin provisioning and can stream these images on-demand to clients via NFS. VMware previewed a similar technology they called "SVI" (Scalable Virtual Images) at VMworld in Cannes earlier this year, but unfortunately that product is not out yet. So for this week's testing, we can only analyze Citrix Provisioning Server and Qumranet Solid ICE.
Remote display protocol performance
In the past I've written about how awesome Qumranet's spice protocol is, and I've linked to our YouTube video showing it running on a client with four displays in a heavy multimedia environment. Since then, a lot of people have criticized spice, saying things like, "sure that test is cool, but it requires 60 Mb per second of bandwidth. ICA could do that too."
My response to this is that I honestly don't know? Citrix ICA connecting to a XenDesktop client with four screens, multimedia, Skype, nick.com, etc. with unlimited network bandwidth? How well would ICA perform? And on the flipside of that, how about Qumranet spice running just regular business apps on a single display? How close is that bandwidth usage to ICA? And what about user experience? Bandwidth is only part of this equation. How good is the experience with certain apps at certain bandwidth levels?
We decided to answer all of these questions this week. We're going to compare ICA and spice head-to-head in many different scenarios: business apps, multimedia, single displays, multiple displays, etc. We'll look at bandwidth numbers and we'll also use an external video camera to record the performance as experiences by the user.
For these multimedia and protocol tests, we'll also run them with plain RDP too, just for fun. (Although that will be of course limited to just a single screen.) But this should be pretty cool: spice, ICA, and RDP, all head-to-head in different scenarios, all with network loads tested, and all recorded on video for head-to-head comparisons.
VMware's new version of VDI enhances the out-of-the-box RDP with multimedia redirection. We'd like to test that too, but we haven't been able to get a license from them yet, so it looks like that will remain untested. (More on that in a next.)
VMware withdrawals from testing
Many of you probably have heard that VMware "does not allow people to publish benchmarks." This is somewhat true. The license agreement in VMware products states that all performance testing results must be approved by VMware before they can be published. To me this doesn't seem like they can legally enforce that, but personally I don't want to sued by them to find out. So for this project, we did the "right" thing and contacted VMware and shared our plans with them.
We went back-and-forth several times over the past few weeks, discussing our test plan and methodology and all sorts of things. As you know, our plans have changed quite a bit, and I think VMware was starting to get a bit frustrated with us since we couldn't seem to make up our minds about what exactly we'd test and how we'd run the tests.
Yesterday we told VMware that we would not be doing a full VMware VDI test (since VDI does not include thin provisioning and it just uses the RDP protocol). Instead we'd focus on Citrix XenDesktop versus Qumranet Solid ICE. By this point we still hadn't received our evaluation licenses from VMware (even though they said we should have had them last week). So yesterday we told them that we might still be able to test ESX with XenDesktop if they could still get our licenses. (Our thought process was that since Citrix XenDesktop integrates so well with XenServer and ESX, why not run the "SysMark" portion of our tests on ESX and Xen?)
VMware responded by telling us that since we're not testing the full VDI product, they're pulling our evaluation license request. They told us that maybe we could re-engage in the future.
We responded and to let them know that we still think evaluating VMware with XenDesktop makes sense, especially since Citrix really leads the market in the provisioning and protocol areas, but VMware leads in the hypervisor area. So really running XenDesktop with ESX-based guests is probably a very popular choice. Unfortunately VMware never wrote back, so I guess we're done with them for now.
VMware is very weird to work with. The license agreement of their products state that you're not allowed to publish the results of benchmark testing on their software unless you first get their permission / approval. There are a lot of people who have publicly questioned whether that restriction is even legal, but we figured that for this testing we'd do the "right" thing and try to get their approval.
On the flip-side, Citrix has been really awesome to work with. Anyone can easily download a 99-user, 90-day, fully featured evalution of the entire XenDesktop suite right from their website, and anyone is allowed to say whatever they want about their products.
Citrix XenDesktop Hangs
Building our environment and getting everything set up yesterday went really well, except for a few little problems with XenDesktop that we need to sort out.
The first is that we simply cannot get our session to use all four of the displays that we have on our Windows XP client workstation. All displays are 1024x768, and they're configured in a straight row from left-to-right. If we maximize the ICA session, it just fills one screen. If we view the display properties within the session, we do see the four virtual display adapters, but displays 2, 3, and 4 disabled. If we click the box to enable them, they light up, but as soon as we hit the “apply” button then they switch back to disabled. We’re using version 10.250.0.8292 of the desktop receiver client that came with the XenDesktop ISO. We tried to look online for newer clients, but man, Citrix's new download website makes it really hard to find things. We're not sure whether there is a newer client or whether we just screwed something up.
The other problem we're having with XenDesktop is that we found a technique that can consistently hang a VM. Our XenDesktop setup is very simple. We have a domain controller, then another 2003 server running all of the XenDesktop components (the desktop broker, web interface, licensing, and farm database). Our desktop host is a Dell 1950, 8 cores, 16 GB, with XenServer 4.1.
One of the tests we wanted to do was to see just how well ICA could do with multimedia and a lot of action over four displays. Since we couldn't get four displays working, we decided to just work with one display for the time being. If we click our movie, it plays no problem. It will play as long as we want. But when we grab the top of the media player window and start dragging the window around, our session / display freezes. When this happens, the audio stops too. We had netmon open on one of our other displays (because hey, we we're not using that for ICA :), and when this freeze occurs, the network utilization drops to zero. Since this was so repeatable we actually captured a Wireshark network trace. It's weird.. The Windows XP VM just stops sending packets.
If we look at the stats of that VM via XenCenter, we'll see it up around 45-50% CPU utilization during playback. But when this freeze occurs, the CPU drops down to 2%.
So we're completely stumped??? We're hoping to get on the phone with someone at Citrix today to work through these two issues. We did record a video which we posted to YouTube showing this whole process. If anyone has any ideas about this problem, we'll promise you lots of cool coverage on BrianMadden.com!
(A side note about YouTube: If you visit the actual YouTube web page for this video, there is a "watch in high quality" link under the video so you can see it at its native resolution.)