Does OS "tuning" help VDI performance? (Part 1)

OS tuning doesn't impact the number of VMs per host or impact the end user's perceived performance anymore.

A few weeks ago a commenter here (edgeseeker) suggested that I was “still living in the 90’s” when I stated that tuning the desktop OS, removing services and tweaking certain configurations could improve performance in a VDI environment. The point being made was that all of the OS tuning being done didn’t impact the number of VMs per host or impact the end user’s perceived performance anymore.

I took this comment seriously as sometimes we just continue to “do it this way” because we always have “done it this way”. But, without some objective results about the impact to scalability or user performance you are essentially perpetuating a tradition and not being a logical technical guru and actually spending time productively.

So with all of that said I decided to run a little test to find out if OS tweaking showed a significant improvement in performance or scalability. The basic idea was to use a repeatable workload that would allow us to look at performance of a Tuned and non-Tuned VM over an hour of running the fixed workload. We could then compare things like MHz in use over the hour, memory in use and even application launch times in each configuration.

To create a repeatable workload I decided to use the VMware Reference Architecture Workload Creator (RAWC) with the workload set to static. I also decided to use a single Windows 7 VM that would run the workload first as a non-Tuned install, then would be shutdown, Tuned by hand, and then would re-run the exact same workload. This would ensure that there was absolutely no difference in the VM at all. Finally this test was run on an ESX host with no other VMs to ensure no scheduling conflicts or outside factors had an impact on the results.

So what happened?

First we run the test with a non-Tuned OS. The image had Office 2007, McAfee AV, Adobe Reader, Java, Skype, and Tweetdeck (this is my standard win7 Image). The RAWC client was installed and set to use Word, Excel, Adobe, IE8 and the RAWC Java application. With the test configured we then ran the RAWC workload for 60 minutes. The chart below shows the CPU usage over that 1 hour time frame.

CPU use:

Screen shot 2010-09-21 at 9.14.42 PM 

You can see the logon spike at about 9:30 am. After that the repeating workload is started. Average CPU for the 1 hour test is 179 MHz. Not a heavy workload mind you, but the “user” is always doing something whether it is reading a PDF file, typing in Word or Excel or using the Java compile application (seen in the graphs as the higher CPU spikes) or browsing using IE8.

Memory use:

Screen shot 2010-09-21 at 9.15.53 PM 

Active memory average was approximately 240MB, with a consumed memory average of 952MB.

Average application launch times:

Screen shot 2010-09-21 at 9.16.54 PM

With the baseline set it was now time to do some basic tweaking. Below I will list out all of the changes made to the VM. These were found at various websites, blogs and MS articles. But it is important to note two things:

  1. Your environment may depend on one or more of these services or settings. So disable at your own risk
  2. This list is not an all inclusive list of tweaks! We assumed that even if some were missed we should see some change in performance/response time

So what did we change?

  • Shut down and disabled the following services:
    Application Experience
    Desktop Windows Manager session manager
    Diagnostic Policy Service
    Distributed Link Tracking
    IP helper
    Offline Files
    PNP service
    TCP/IP NetB helper
    Themes
    Windows Media Player Network Sharing
    Windows Search
  • System Restore was disabled
  • Automatic updates was disabled
  • Defrag was disabled
  • Indexing was disabled in BOTH VMs
  • Desktop was set to Basic Theme and Aero was disabled
  • Visual Effects were set to adjust for best performance
  • Windows7 Features removed from add/remove programs:
    Games
    All Media Features
    Remote Differential Compression
    Windows Gadget Platform
  • Files and folders view was set to always show icons never Thumbnails
  • Bluetooth support removed from MSconfig
  • IPv6 and other “network” services removed from the NIC
  • MenuShowDelay modified from 400 to 1 MS

Note that some items (such as screen savers and IE settings) were ignored as the test machines would never get to “idle” and IE temporary settings were not important to these test results. These are still important settings for your environment (preventative medicine if you will) but were not used as they would not have impacted these tests.

With all of these configurations changed the machine was rebooted and logged in to run the RAWC.

CPU use - Tuned machine:

Screen shot 2010-09-21 at 9.18.03 PM

Again we have a login spike (just after 11:00) and then the repeating workload. The Average MHz in use is 180 MHz. Essentially the same as the non-Tuned VM.

Memory use - Tuned machine:

Screen shot 2010-09-21 at 9.18.32 PM

The active memory in the “tuned” VM was actually 264MB approximately 24 MB HIGHER than the non-tuned machine. The Consumed memory for the VM was an average of 975MB, again about 23MB more than the non-Tuned VM.

Average application launch times:

Here is where we see a benefit from the tuning. While CPU and memory were effectively unchanged, the application response/launch times seemed to improve overall:

Screen shot 2010-09-21 at 9.19.08 PM

As you can see, the average response time was shorter in the tuned OS. Applications launched (on average) almost 9 % faster in the tuned VM. Word seemed SLIGHTLY slower but had one sample that was more than 25% longer than the others and may have skewed the results. A larger data set (additional runs) may even that average out.

Conclusion: Is it worth it?

With very little research (about 15 or 20 minutes worth) and about 15 minutes inside the OS we were able to make changes that would speed up application launches (impact perceived performance) by 9 % or so… So yeah! It’s worth it! Changes like these to a gold image that may be used for hundreds and hundreds of desktops are simple to do and don’t take very long. But we (As IT Geeks) should realize that this “tuning” may not get us any more VMs on the host servers or decrease the CPU or memory footprint significantly and instead are about perceived performance from the user perspective. Also, any benefits you may see in RAM usage is limited as common/duplicated services will often fall prey to the memory sharing processes of ESX and therefore will show little benefit of being removed from the system.

But Ron, what about other things? Like indexing and defrag, etc? Those could kill CPU when the user is not even online! You bet they could. And I think it is best to disable those types of services as a preventative measure. But all these other services are small and generally eat up less than a % of CPU and a MB or 4 of memory, and will show little impact to your scalability even if they do help in response time.

Bottom line? It seems to have helped perceived performance, but seems to have no impact on the number of VMs per host with the exception of things like the indexing services eating CPU when people are not even using the system (not trivial).

Join the conversation

23 comments

Send me notifications when other members comment.

Please create a username to comment.

This defrag on Windows 7 must absolutely be disabled, since it will kill the VM, and not generate benefits.


I believe this 9% benefit would scale when dozens of VMs are running on the same host, and even give you some more VMs per host (just a guess). Those little CPU cycles saved by disabling many services, might also be significant when you multiply this by many VMs.


Another point to mention: Disabling animations and visual effects will improve network usage and display protocol response times, so, it is definitely worth doing.


Cancel

Anyone from a Terminal Services background should be feeling a sense of 'Deja Vu' while reading this.


Any 'shared platform', whether it is a Terminal Server or a Hypervisor will benefit from optimisation of the workloads they are expected to handle. This does not just derive benefits of increased performance but also better reliability, scalability and security. In addition, any application delivered using a remoting protocol will obviously benefit from optimisation of any factor which increases the amount of data transferred across the wire.


The most successful Terminal Server projects I have been involved in are those which understood the necessity to optimise everything and ensured ample time in each project to ensure that this happened., this will also be the stick by which successful VDI projects are measured.


Nothing new here, certainly as far us old TS/RDS guys are concerned.


Come on Ron, you just dug out one of your old docs, did a 'find' on TS and substituted VDI.!!


Cancel

9% difference between two measurements - I have a BIG problem with using this number in any argument:


1) That number surely is within measuring inaccuracy. If you want to get reliable numbers, repeat the measurement multiple times.


2) Users do not notice performance differences smaller than 50%. So even if 9% is correct, that is an absolute number. Relative (perceived) difference will be 0.


Cancel

Ron,


Fantastic article this is the type of detail people are looking for!  I noticed this is part 1.  Will part two include more optimizations and more analysis?  I would very much like to see the impact on IOPS.


It’s too bad the improvement is only 9% I have to agree with the rest of the posters that 9% will go unnoticed by the user community.  That said in a 10,000 seat deployment it could translate to significant savings.  Our prospects and customers have been telling us they have explored this as well.  They all seem to come to a similar conclusion.  It typically plays out something like this.


Our VDI deployment is small compared to physical desktops and Laptops, and the complexity this introduces into the build process is not worth X% savings.   We applied the tweaks using a combination of manual changes and automated script from other vendors.  We have observed that maintaining the tweaks is a major chore as application and OS updates frequently override or negatively impact the modified base OS.


Of course I have a limited working knowledge of how the tweaks are impacted over a long time period.  And the customers and prospects I’m working with may have unique environments which do not represent the majority.


Cancel

Helge,


the 9% number for application launch time IS a repeated number. The "fixed" workload repeated during the 1 hour span. Meaning we didnt measure 1 launch of word. We measured several and took the average.


Is 9% a HUGE deal? no. Not when you are dealing with a 2 second launch time and essentially cut it to 1.8 seconds. A user wont notice. They many say it "feels" more snappy or something subjective like that even if they notice at all.


My real thing here was to point out that in running VMs we wouldnt scale anymore with all this tuning done.


Cancel

Interesting test but they are not surprising.  As you know, I've been collecting Win7 optimizations on VirtualFeller.com. Most of the optimizations I've seen so far really wouldn't impact CPU or memory.  I think we should look more closely at the disk subsystem.  As an example, having the OS update the last access time stamp is a write to the disk. Let's get rid of that.  Thus we lower the impact on storage.  These writes really wouldn't place a big impact on CPU and nothing to memory.


Desktop OS optimization is supposed to make the OS more responsive to the user while helping to reduce the stress on the underlying hardware. Most of th stress will be on the disk and not on CPU/Memory.  


Cancel

Dan,


I think the last access time stamp could reduce the writes in VDI though may not change the number of VMs you get per host. My goal in this was to see any change at all in number of VMs per host and I think these numbers show that there is now change. A slight improvement in app response time? yes. Noticable by the end user?  Probably not.


Will it hurt it? no and for a little bit of time its worth doing in a gold image but we dont need to kill ourselves.


Cancel

Correct, i doubt these will impact concurrency on the hypervisor (unless you talk about the big optimizations like defrag, but that is a no brainer), but it should lessen the load on the storage, which is good. Won't help getting more VMs per server, but might help reduce the storage impact


Cancel

Dan,


Exactly.  I think Defrag, and other heavy tasks should be disabled or taken out of the VM and done elsewhere, but as a preventative thing.


Meaning they dont help scalability, but they keep that process from hurting your numbers. If that makes sense.


Cancel

Dan brings up a question I have.  I think it will be interesting to see what the impact on IOPS is with these tweaks.  


Cancel

Robert and Dan,


The tweaks listed had no impact (or none that could be measured). Most the these services and changes do so little (as shown by the limited memory and CPU help) that I saw no difference in IO.  Both VMs were essentially the same from the number of operations and amount of data written or read.


I think Dan's tweak on last access time stamp may show some improvement. But my next one (part two) will talk some about that.


My interesting finding (in futher tests) is the serious difference in IO for logon storms based on various profile configurations.   Unlike Reboot storms, logon cant be staged or scheduled. Reboots can be mitigated at least by scheduling them somewhat when you update VMs.


Logons... not so much. 8am call center workers all login 7:59 - 8:01 (or whatever).


I found that local profiles reduced IO per desktp by more than 2/3rds vs a standard romaing profile.


A standard roaming profile, w/ folder redir for Windows7 (on a desktops that is used) is generating an average of 1600+ IO over a 60 second logon period while a stadard profile (same profile but local) shows about 500 over the 60 second period. It makes sense, but it is often just assumed that you must have a romaing profile. Here is a reason to think about.


Robert,


Know you are in the IO business, and with an IO accelerator the profile thing isnt so important. The only benefit of going local vs roaming in that case is to speed up login.


Cancel

Ron -


Just what memory measurement is that?  I am guessing that after tuning, the OS had lower committed memory and thus was less agressive in paging things out of RAM, possibly leading to the higher memory measurement.  If so, this is a good thing which will show up as an added bennefit in multi-VM tests.


Cancel

Looking at the VMware metrics Active Memory and Granted Memory.


Cancel

Gahwd, OS tuning have for me been somewhat of an obsession and agony over the years from DOS to Windows 7/2008 R2.  


Say, a lot of us surely remember Rich Dehlinger et.al MetaFrame Tuning Tips from late ‘90s to early ‘00s. See here for example - www.thin-world.com/.../MFTips1.pdf and others.


Tuning ADM templates, batch files and countless tools flew high and low. An example of my own idea of the bare basic “standard” tuning from Windows 2003 TS era is recorded here - jernstrom.org/.../Standard5.zip.


Countless other documents, tools, scripts, adm templates, posts etc. are living in dropbox, discarded USB-pins, dead computers and whatnot. I never disabled any services oddly enough.


It’s not like I’ve not been challenged to those tuning things as I guess no one of us have. My explanation, excuse if you wish, was that these were suggestions and examples that might make some impact and should at least be considered for whatever value they might have. Sometimes some stuff did really matter, other times maybe not so much, still way overdone by any measurement.


The challenge is of course – Is it really worth it? I don’t know, I’m a recovering obsessive tuner :)


As some, I skipped the Vista/2008 stuff (except for GPO Preferences) and jumped back with Windows 7 and 2008 R2.  


As @Helge might recall I went nuts with the 7/08R2 shell, meaning Known Folders and specifically Libraries, aka iShellFolder interface (Wndows 7 SDK shlib.exe build) and the RemoveDuplicateProfileLinks.exe source from The Deployment Guys blog.


As of now I’m still as much ambivalent as ever before, just a bit more pragmatic. Leaving it bee, for surely I’m not the one to administer and maintain? :)


I guess I agree with many that best is to just tune the obvious matters in the OS by requirement and concentrate on the application tuning and standard streamlined Workspace environment.


For all I know VM sprawl, wasted disk and problematic disk IOPS are the real problems as of now, way before CPU and Memory allotments.


Cancel

The results are interesting to say the least.  Honestly, I would be happy if tuning didn't matter as it would make our jobs easier (but don't tell anyone as this stuff makes us look smart).  


I know of a lot of the tuning tips are carry overs from the Terminal Services/XenApp optimizations and many of those do show improvements, especially when we get to memory management (w2k and w2k3).  We used to also recommend to disable all of the splash screens and animations as the protocols couldn't handle it.  So this begs the bigger question... Has technology improved to a point that makes these optimizations pointless? We know that ICA/HDX can do graphics much better than years ago, same for animations.  We know W2k8/W7 is much better at memory mgmt than W2k3/XP.  


I guess i would need to see more cases to see if it did indeed make a difference. As your test was only 1 VM, I'm hesitant to say "Get lost optimizations".  As it doesn't appear that these settings hurt, I'm erring on the side of caution and plan to continue down the path of an optimized OS.


Cancel

Im with Feller on that.


Even if the performance doesnt change alot on chart (wich is somehow supicious to me). Making the UI lighter will make the protocol (ICA) life easier and "faster" in the end. Im not against eye candy but cpu cycles and bandwidth arent free.


Im yet to change my mind about optimizations, stop doing and using adm files for XD and XA. I think everyone should do it, because in the worst case scenario (if im completely misleaded) you get to know the products and the OS better.


Cancel

Good job, Ron.  


CPU utilization management inside each VM might also be a good idea to further improve perceived performance. Although the hypervisor can manage CPU utlization at the VM level, there's nothing to prevent runaway processes from pegging a particular VM.  I'm actually surprised none of the VDI vendors have done this so far, especially since most of them have had a CPU utilization management offering for RDS for quite a while.


I didn't expect any improvements in terms of number of VMs per host. I've already performed a similar exercise by building a minimum-size desktop image using one of several guidelines and deployment kits out there.


I still like what Atlantis Computing does in terms of I/O acceleration. It's a night-and-day difference.


Cancel

Interesting Guys.


First up, Yes I do work for AppSense


Second up, Yes performance on Windows systems and especially Shared Infrastructure (Citrix/TS/VDI/Virtual Servers etc) has been a large part of my gig for the last 6 1/2 years.


Our Performance Manager product improves scalability, and provides more consistent performance in any VDI environment.


That's not our corporate message, it's a message my customers give me (so I'm happy to use it).


I've given up arguing with Virtual Infrastructure "experts" about why our product won't help, or won't work due to cpu cycle splitting or some other hypervisor hocus pocus.


I just get customers to test it, they say it works, I'm happy :-)


So trimming down what a VDI OS has to do in the background improves performance - seems pretty logical to me.


Just as logical as granualr management of CPU and Memory INSIDE a VDI Guest with mean the guest gives more consistent performance, and uses less resources from the Host.


You learn something new everyday in IT :-)


Cancel

I think that VDI vendors should include an integrated tool (with the agent or similar) to automatically disable/enable services/features as requested at least for common VM desktop like XP or 7.


This to consolidate optimization and best practice.


Cancel

I am still testing what optimizing can do for my cuurent hyper-v environment.  (I hate vmware) But from tips to optimize vm os's to the tools vendors have to reduce iops or storage footprint I think it's a better practice over doing nothing.


Maybe a low cost tool to make all the other tips and changes these blogs recommend.  Most people who say they don't think its worth it because they are lazy. If there was a tool I think people would lock on to it.


The accelerators, optimizers and storage tools all seem to make an impact for sure.


Cancel

@Smcpartin


blogs.technet.com/.../optimising-windows-7-images-for-use-in-vdi.aspx


It's free - doesn't have everything yet but it's a good start.


Dan.


Cancel

I think the goal of optimizing the guest OS is to improve the user's perception of the virtual environment. Users may not always notice a 9% improvement, but they sure do notice a 9% degradation. The vendors are always looking for ways to optimize the memory and CPU usage on the hypervisor and they will continue to do so. But sometimes, like edgeseeker points out, optimizing inside the guest is still an area where improvements can be made.


I believe the optimizations to implement will be dependent on the applications being virtualized. Consider that some optimizations are specific to a particular application like Internet Explorer's "Force Offscreen Compositing" setting. When that IE setting is not enabled, the users see a lovely blinking effect when some pages load in a remoted environment. Enabling that setting may not change density per server, but not enabling it will result in the user's perceiving the VDI environment to be unusable. I bring this up to point out that such tests may not benefit from generic optimizations or may not change CPU/RAM consumption on the host, but in the real world they could be the difference between a successful POC and a lost opportunity.


As amply discussed above, some optimizations improve other performance metrics such as IOPS and in and of themselves can provide a cost reduction by saving storage costs. I believe that storage response time has the greatest affect on user perception, more so than CPU or even RAM in some cases - probably because despite the move to virtualization many applications and operating systems still make assumptions around the storage being local and fast. Targeting storage related optimizations and software should be at the top of our list for improving user experience. Storage is such an integral part of the user experience those optimizations almost always make sense.  


I guess all said and done, I would still recommend optimizing the OS where it makes sense. You may not get any more users on the box, but you get happier users that adopt the technology and recommend it to their friends and we all benefit.


Cancel

To emaxt6's point, Citrix actually includes an optimization tool in XenConvert and the Provisioning Services Image builder. Also, as Daniel pointed out above, Jonathan Bennett (of AutoIT fame) has released tool that generates a VBS script to optimize the Windows 7 environment which I find is great for optimizing via Logon scripts.


Cancel

-ADS BY GOOGLE

SearchVirtualDesktop

SearchEnterpriseDesktop

SearchServerVirtualization

SearchVMware

Close