Brian Madden Logo
Your independent source for application and desktop virtualization.
Marketplace

advertisement
Brian Madden's Blog

We're planning to benchmark Qumranet (KVM) vs. Citrix XenDesktop vs. VMware ESX for VDI workloads. Help us design the tests.

Written on Jul 09 2008 5,729 views, 44 comments


by Brian Madden

Qumranet made a big splash at BriForum last month. Those of you who were there saw a demo of their "spice" remote display protocol, which showed a four-monitor Windows XP desktop running remotely with full motion high def video, skype, IE... the works. Spice is one of the most amazing things I've seen in this industry in a long time. It's 100% software-based and available now as part of Qumranet's "Solid ICE" VDI product.

If you've never seen spice, here's a short little video I shot of it when I visited Qumranet's office a few months ago:

Make no mistake: Spice takes bandwidth. A lot of bandwidth. Just how much bandwidth depends on how many screens you have and what you're doing. But obviously if you're watching full-motion high-def video that in it's compressed codec state is a few gigabytes in size, this isn't quite going to work over a 20kpbs connection. Office apps could easily take "normal" amounts of bandwidth, but with four monitors, high-dev video, skype, and the works, we could easily get spice up over 100mpbs.

(Perhaps this is a conversation for another day, but personally I'm fine with this. Spice is a LAN solution for environments that need more than what ICA or RDP can do. And if you need this, then you understand that bandwidth is important too. Of course if you go down this route, you probably already have 100 meg or gigabit switched ethernet to your desktops. And finally, yes, spice really is a LAN-only protocol. In fact if you make a Solid ICE connection over a WAN, Qumranet thinks, "well, since you have a WAN, you are already valuing remote access more than performance," and they just drop down to using vanilla RDP.)

But the point is this article is not about spice. The point is that in addition to using the spice protocol, Qumranet's Solid ICE product uses that "KVM" hypervisor instead of Xen or VMware or Hyper-V. (Remember last week's little conversation about KVM?) Qumranet feels that KVM is a better hypervisor for VDI environments than anything else on the market, and they've run some tests that they feel prove it.

The problem is that as a vendor, people won't really trust Qumranet's benchmarking results since obviously Qumranet would have a lot to gain if KVM is more efficient that something else.

Therefore, Qumranet has hired Gabe and me to conduct performance tests of their product and to compare it to other leading VDI products. Gabe and I will be doing that work next week, and we're spending this week putting together our test scripts and plans.

We've published performance results in the past, and inevitably someone posts a comment like "Your results are crap because you didn't do xxxx." Or "why didn't you configure xxx option."

So this time, we're turning this model around. We're inviting the entire community to review our plans, and we hope to address any perceived problems ahead of time.

Background information about this performance test project

First, just to make sure there is no gray area, Qumranet is paying us to conduct this test and to publish the results.

Second, we are going to test Qumranet's Solid ICE product, Citrix XenDesktop (using XenServer as the VDI host), and VMware's VDI solution.

Since Qumranet is paying for the test, they will obviously provide a engineer or two for us during the testing who can answer any questions that might come up. But again, we want to make this test as fair as possible, so we contacted Citrix and VMware and let them know what we're up to. Both Citrix and VMware have agreed to make engineers available to us during the testing as well.

Also, many of you are probably aware that VMware's EULA does not permit public disclosure of benchmark or performance testing without prior approval. Part of me wants to say "F You" to that and publish our results anyway, but part of me wants to do the "right" thing and try to get pre-approval from VMware.

I talked to my contacts over there, and it turns out that this whole pre-approval thing is pretty easy. Basically we told them what we were doing and why, talked about our scripts and our methods and stuff, and agreed to provide them with the full results, and they were cool with it.

And for us, this is something we wanted to do anyway, since we want Citrix, VMware, and the whole community to view our tests as "valid."

That said, let's take a look at what we're planning on doing.

Our testing methodology

For this project, we're testing the efficiency of the hypervisor when it runs VDI loads. VMware has their VMmark standard benchmark test suite, but unfortunately that is for server workloads only and does not include VDI use cases. We'd also love to use Login Consultants' LoginVSI suite, but that's still in beta and only available for TS environments currently. (Plus there is some work they need to do on randomization which I'll talk about later.)

The bottom line is that we're pretty much on our own so far as building the test environment.

Our fundamental idea is that we'll do this in a way that's more-or-less similar to the way we do terminal server tests. We'll write a AutoIT script, run it in a whole bunch of Windows XP VMs, and then just see how many VMs we can throw on a box before the performance gets too bad.

The only real "catch" here is that we really want to simulate the "randomness" of real-world desktop users. Today's hypervisors do a really great job of caching and memory sharing and all kinds of things, so if you have 50 users in 50 VMs all running the exact same script, your lab tests will show user densities a lot higher than what you can get in the real world. So we want to write scripts that have different users doing different things.

We feel the easiest way to do this is to create small activity "modules" which we can re-user and re-combine to create our user scripts. We want to create maybe 50 or 100 different modules. Some modules will be simple, like opening notepad, jotting down some notes, saving the file, and closing notepad. Some will be more complex.. loading large word docs, find and replace, spell check, embed OLE Excel chart, etc.

Then once we have all of our modules built, we can just run them all in random order in each VM.

Now of course we need to be able to run the exact same tests on all three platforms, so we can't randomize at run time. Instead, we need to "pre-randomize" our scripts so that we can run the same script modules in the same order every time we run the test. Ultimately we're planning on creating a script for each user with a random selection of modules. Then we can run these scripts in the various VMs and get our random experience, but still have the same randomness from test-to-test.

We're also planning on randomizing the reaction speed of the users. AutoIT allows us to set global variables that specify how fast a user performs their actions after a screen pops up. We'll randomly configure each script for something in the 1/4 second to 3 second range.

As our scripts run, we'll have them dump out the run times of each module to a CSV log file. When we're done we should have a huge log file with thousands and thousands of module run times (as well as the corresponding data like what the user's reaction time was as well as how many VMs were running on that box.)

What we will make public

Since Qumranet is paying for this project, they want us to summarize our results and write a few pages and make some nice charts about our findings. However, we think there is a lot of value for the community at large, so we are also committed to releasing all of our test scripts and raw data, including:

  • All the individual activity modules
  • Our "script to create the scripts" that will create our 100 (or whatever) "pre-randomized" scripts
  • The 100 individual user scripts
  • A complete description of our Windows XP setup so that you can get these scripts running in your environment
  • A full dump (Excel, CSV, SQL? dunno yet) of our raw results, with every module for every user and the timing for all three platforms.

We understand that we're not trying to create a new industry benchmark. Instead, we want to say "Here are some tests that we've done. Here's how we did them. Here are the results. Here's what you need to know to run similar tests yourself."

A few other notes about this Qumranet test

There are a few other specifics of this project that are probably worth talking about.

First, we're going to run these tests on "normal" hardware. Probably something like an 8-core server with 16GB RAM. This test is not about building the biggest server in the world to server the most VMs. It's about seeing how the various hypervisors perform on normal servers with real-world workloads.

Second, we're testing the hypervisors, not the protocols. A complete "solution" test would require analyzing the performance of the screens across the network and all sorts of things. In our case, we're just going to run the tests and time them from the server. (By the way, we're not even really concerned with CPU or memory utilization. We just want to run these modules and see how the overall "slowness" of the server is affected.)

Questions we have

So now that you understand our plan, what do you think? Specifically,

  • Do you think these tests are "valid?" If not, why not? What should we do differently?
  • What "activity modules" should we build? What apps should we install / simulate?
  • Is there anything special that we should do when we install each of the three products?
  • Is there anything else we're not thinking about?

Since we're running these tests next week, we really hope to be able to get the results published the week after that. I'm sure that's where the "real" comments will be made, but I'd like to "pre-address" anyone's concerns now.

Thoughts?



Comments

Kata Tank wrote Difficult to have a full series of test...
on 07-09-2008 8:05 AM

Good luck for you.

Companies will put weight on result depending on their own need and goals. Some focus on graphcial performance, some on WAN access, some on manageability...

Probably good to get a series of group like :

  • Manageability
  • Performance : user point of view
  • Performance : administrater point of view
  • Resources (server, WAN, ...)
  • Cost

for which eveyrbody can after put their own value... 

Brian Madden wrote Re: Difficult to have a full series of test...
on 07-09-2008 8:13 AM

Yeah I agree that this is the kind of stuff that's needed to make a decision about which VDI solution to buy.

I want to be very clear on the fact that we are not trying to do all these tests. We are not trying to recommend which VDI solution someone should buy. All we are doing is looking at how the Windows XP VMs perform on various hypervisors with a given set of hardware.

Matt Langguth wrote Isn't it the protocol?
on 07-09-2008 8:25 AM

Brian, 


Congrats on having the guts to try an pull a test like this off. I believe this is the first time an independent, albeit vendor sponsored, 3 way test has been performed with the specific intention of documenting VDI performance. I remember seeing Ron Olgby's VMware vs. Citrix presentation from BriForum 2006 but there has been significant improvements to both products since then.

A few questions/comments come to mind:

- What are you going to be using for backend storage i.e. FC or iscsi?

- Will any of the applications be virtualized i.e thinstalled or will each VM have a local copy of the apps?

-Will all VMs have there own vmdk file or equivalent ?

-My biggest concern is how you are treating the protocol. You've comment numerous times on the performance difference between RDP and ICA. The very nature of the implementation requires a user to access the desktop via some remote protocol, in turn the protocol selection greatly effects the users perception of performance. For example if we have two VMs with exact same build running under VMware a user accessing the VM via ICA would most likely have a more responsive machine then the same user accessing it via RDP.

Guest wrote Some additional vendors
on 07-09-2008 8:32 AM
This is great but I don't see Virtual Iron in the test load neither do I see Provision Networks VDI solution with their multimedia redirection software.  Since what you are attempting is a mammoth task and what you are looking for is real world stuff minus the marketing waffle I suggest broadening the basis and also include the connection brokers and offerings in general.  VDI is still a niche but it would be good to see which vendor at this time is the better option or that is ahead at this point in time.  I am confident that if you ask VI and PN to join they will be more than happy to.  For me and my company we do consulting and it would be a matter of having a independent view on findings instead in trying to convince ourselves that VI and PN is the best option in what we are trying to achieve.
Guest wrote swap
on 07-09-2008 8:51 AM

hello,

for the xp installation 

will/could you try with the vm guest swapfile (pagefile.sys) on localdisc (not san), on san with the vmdk and without swap file at all  ?

 

in one of the module a kernel panic error or something like to force the machine to make a bsod and reboot to know how long the dump/reboot will take on a workload charged system


yves deglain 

Rene Vester wrote Heavy and graphical load
on 07-09-2008 9:02 AM

Hey Brian,

 The people i talk to about implementing VDI solutions seems to mostly concerned with the ressource requirement for "heavy" and graphical applications, for both Hypervisor, client and network.

Could be interesting to see those workloads with meassured effect on both hardware and network.

/LamerSmurf

Guest wrote hypervisor
on 07-09-2008 9:05 AM

forgot to mention as the test if finaly about hypervisor perf in vdi workload why don't you test PN virtual access suite vdi on hyper-V and sun VDI

Bert Bouwhuis wrote While you're on it...
on 07-09-2008 9:17 AM
Might be interesting to also include everyday overhead and management tasks, such as user logon/logoff, O/S upgrading, virus checking, etc.
Siegfried Huijgen wrote Significant or not.....that is the question....
on 07-09-2008 9:50 AM

I think you need to consider when your test results become significant. Let's say you create 10 sets of pre-randomized scripts and let them run on the different platforms. You combine the results and summarize. Is this significant or do you need to re-run the sets, let's say 3 times on all platforms, first compare the results against a standard deviation you previously definced, then combine and summarize the results? Or is running one set of 100 pre-randomized tests significant?

Am looking forward to reading about the test setup and results.

--Cheers--

Siegfried

Guest wrote Provision Networks together with Virtuozzo
on 07-09-2008 1:27 PM

Here is another "Please, please, include me!"

 But if you're comparing the intergrated VDI solutions (XenDesktop on XenServer, VDM on ESX, Qumranet on KVM) then I would also be interested in how VAS from Provision Networks in combination with Virtuozzo would compete. Although they are not 1 vendor (yet), they did make a deal that they will offer their solutions in a bundle. I would bet that performance in relation to the price it would come out best.

Guest wrote Limited Resources
on 07-09-2008 2:18 PM

I would assume you have limited time and money to run these tests, so for everyone asking you to add more to your plate is a bit unreasonable.

I think you should just use the basic Office applications, Acrobat Reader, IE, etc.  There is no way you are going to saticfy everyone, although it was nice that you asked for input.  If you can provide some generic data that everyone can use as such, that would be great. 

If you want to add a heavy graphical user into the mix since this will be a resource hog in VDI (all rendering done in software) you could just use a 3D PDF file, thus you don't need to add another application to your list.  You can get some sample files here: http://www.bentley.com/BentleyWebSite/Templates/Corporate_Generic.aspx?NRMODE=Published&NRNODEGUID=%7bC01BF61A-564A-4D55-AC9A-255DF2BA226E%7d&NRCACHEHINT=Guest#gallery1.  I would suggest using the "Flyover of a digital terrain model
" since it is automated.

 Just some thoughts, thanks and good luck.

Tim Mangan wrote Ideas...
on 07-09-2008 4:35 PM

Consider the kind of results I used to show in those briForum "Perceived Performance" sessions.  For your test, I would expect to see something like "time to complete" versus the number of loads on the physical hardware.

Having two different kinds of loading scripts would also probably be more fair (and enlightening).  One set that would emphasize CPU performance, and another that would emphasize IO performance (not sure if I mean file or network?).

David Caddick wrote Re: Limited Resources
on 07-09-2008 4:53 PM

Hi Brian, I would also second this - I'm seeing lots of interest in VDI from Education but by their very nature they are also a heavy user of Graphics and Video - I'm sure this would also be the same in the US and EMEA?

Good Luck

Brian Madden wrote Re: Some additional vendors
on 07-09-2008 5:36 PM
Interesting ideas definitely... However, Virtual Iron runs on Xen, and the Provision stuff is protocol-related, and we're testing the hypervisors and not the protocols (in this case). However, since we'll make our whole test scripts available, maybe you can run some additional tests and share your results too?
Brian Madden wrote Re: Provision Networks together with Virtuozzo
on 07-09-2008 5:39 PM
Yeah, Virtuozzo would be interesting, as would Terminal Server. To be honest it's going to come down to time. We won't have a huge amount of time to test more than three setups, although maybe that could be something that someone else from the community could do?
Brian Madden wrote Re: Limited Resources
on 07-09-2008 5:40 PM
AWESOME!! These are really cool. Nice idea.
cjr wrote What about the points that make hosting desktops different from hosting servers?
on 07-09-2008 5:42 PM

I would keep the portion that makes VDI what it is, that being the display remoting.  I'm not speaking to user-experience surveys, but the impact to the guests and thus hypervisor that including a display can have, as it will directly impact your peak load and your scaling, is desktop related, and also specific enough where you could get closer to apples-to-apples.

For example, if one of Qumranet's capabilities is the ability to offload processing to a client and VDM 2.1 and/or Wise TCX can offload some processing for some codecs to the client and outside a channel, that will directly impact your scalability and peak load.

To me, these would be some items that would maintain a VDI slant.  If it's truly "hypervisor-only", but with more "typically desktop" application loads, that's fine -- but ends up with too many what if's on application types, etc.

Even with a medium load, and running several scenarios for which the desktop display factor is a performance impact, it's easy enough to see how much this can vary your scaling.  But on a regular PC, run a good-quality video stream from a flash-based player, and watch your CPU sit at 20-50%.  Do it on a somewhat constrained VM, and watch it sit at 95-100%.  Now have a bunch of people do that at around the same time.  (ever see a Corp Comm. link to a video, or a ton of people look at the same news item at the same time?)

(BTW, if anyone out there is able to offload to a client today, for streaming media in Flash, please speak loudly and clearly, I'm listening.)

Sharin Yeoh wrote How many VMs ...
on 07-09-2008 7:39 PM

Hi Brian,

with reference to how many vms you can throw at it, would you consider using XenDesktop with Provisioning Server so that you can have the single "golden" image streamed to multiple "diskless" virtual machines?

what type of SAN storage would you be using?

would you be simulating load balancing/fail over or DR (eg. VMotion, XenMotion etc)

personally I think the end user device would also make difference to these test also. eg. Thin Client devices (HP, IGEL, WYSE - running LinuxOS or XPe) etc as well as protocol (SPICE, RDP & ICA would have differences with supporting things like USB devices etc). 

But I understand that you are testing just the load on the Hypervisors at this stage :) Anyway, can't wait to see the outcome...Citrix rocks!!! (I'm biased..sorry!)

 

 

Tony wrote Great News!
on 07-09-2008 8:29 PM

Brian,

This is awesome...I've been hounded by QN sales guys since I met them at BriForum 2007, to the point where I made mention to it to Navin at this years BriForum.  I think they have a real interesting product with a lot of unanswered question on performance.  I was told by their sales guy that they don't do "demos" however they were handing out USB sticks with demos at BriForum.  I already have a bad first impression just based on the sales guy I've dealt with and my biggest issue was that I don't know anyone who's using or even piloted their product - this is a great!!!

One of the things that I was told about QN is that they have a feature similar to linked clones in VMWare Workstation that you can have a VM template and clone all your other VM's and only have delta changes to reduce storage requirements, I'd be interested to see how well this really stacks up.

A few users have already mentioned storage, so I'll contribute to that as well.  Being that it's Linux based I'm assuming the storage of choice would be NFS.  A lot of people have had problems getting iSCSI to work with Xen so I'd be interested in how easy (or hard) is it to get iSCSI working

Lastly, it is Linux KVM...so seeing how the deployment of QN would be interesting especially if you look at it from a linux-fearing Windows administrator point of view.  Personally I like Linux but there are quite a few people out there that have a gripe with ESX because they think they need to know Linux to install and maintain it.