A few weeks ago a commenter here (edgeseeker) suggested that I was “still living in the 90’s” when I stated that tuning the desktop OS, removing services and tweaking certain configurations could improve performance in a VDI environment. The point being made was that all of the OS tuning being done didn’t impact the number of VMs per host or impact the end user’s perceived performance anymore.
I took this comment seriously as sometimes we just continue to “do it this way” because we always have “done it this way”. But, without some objective results about the impact to scalability or user performance you are essentially perpetuating a tradition and not being a logical technical guru and actually spending time productively.
So with all of that said I decided to run a little test to find out if OS tweaking showed a significant improvement in performance or scalability. The basic idea was to use a repeatable workload that would allow us to look at performance of a Tuned and non-Tuned VM over an hour of running the fixed workload. We could then compare things like MHz in use over the hour, memory in use and even application launch times in each configuration.
To create a repeatable workload I decided to use the VMware Reference Architecture Workload Creator (RAWC) with the workload set to static. I also decided to use a single Windows 7 VM that would run the workload first as a non-Tuned install, then would be shutdown, Tuned by hand, and then would re-run the exact same workload. This would ensure that there was absolutely no difference in the VM at all. Finally this test was run on an ESX host with no other VMs to ensure no scheduling conflicts or outside factors had an impact on the results.
So what happened?
First we run the test with a non-Tuned OS. The image had Office 2007, McAfee AV, Adobe Reader, Java, Skype, and Tweetdeck (this is my standard win7 Image). The RAWC client was installed and set to use Word, Excel, Adobe, IE8 and the RAWC Java application. With the test configured we then ran the RAWC workload for 60 minutes. The chart below shows the CPU usage over that 1 hour time frame.
You can see the logon spike at about 9:30 am. After that the repeating workload is started. Average CPU for the 1 hour test is 179 MHz. Not a heavy workload mind you, but the “user” is always doing something whether it is reading a PDF file, typing in Word or Excel or using the Java compile application (seen in the graphs as the higher CPU spikes) or browsing using IE8.
Active memory average was approximately 240MB, with a consumed memory average of 952MB.
Average application launch times:
With the baseline set it was now time to do some basic tweaking. Below I will list out all of the changes made to the VM. These were found at various websites, blogs and MS articles. But it is important to note two things:
- Your environment may depend on one or more of these services or settings. So disable at your own risk
- This list is not an all inclusive list of tweaks! We assumed that even if some were missed we should see some change in performance/response time
So what did we change?
- Shut down and disabled the following services:
Desktop Windows Manager session manager
Diagnostic Policy Service
Distributed Link Tracking
TCP/IP NetB helper
Windows Media Player Network Sharing
- System Restore was disabled
- Automatic updates was disabled
- Defrag was disabled
- Indexing was disabled in BOTH VMs
- Desktop was set to Basic Theme and Aero was disabled
- Visual Effects were set to adjust for best performance
- Windows7 Features removed from add/remove programs:
All Media Features
Remote Differential Compression
Windows Gadget Platform
- Files and folders view was set to always show icons never Thumbnails
- Bluetooth support removed from MSconfig
- IPv6 and other “network” services removed from the NIC
- MenuShowDelay modified from 400 to 1 MS
Note that some items (such as screen savers and IE settings) were ignored as the test machines would never get to “idle” and IE temporary settings were not important to these test results. These are still important settings for your environment (preventative medicine if you will) but were not used as they would not have impacted these tests.
With all of these configurations changed the machine was rebooted and logged in to run the RAWC.
CPU use - Tuned machine:
Again we have a login spike (just after 11:00) and then the repeating workload. The Average MHz in use is 180 MHz. Essentially the same as the non-Tuned VM.
Memory use - Tuned machine:
The active memory in the “tuned” VM was actually 264MB approximately 24 MB HIGHER than the non-tuned machine. The Consumed memory for the VM was an average of 975MB, again about 23MB more than the non-Tuned VM.
Average application launch times:
Here is where we see a benefit from the tuning. While CPU and memory were effectively unchanged, the application response/launch times seemed to improve overall:
As you can see, the average response time was shorter in the tuned OS. Applications launched (on average) almost 9 % faster in the tuned VM. Word seemed SLIGHTLY slower but had one sample that was more than 25% longer than the others and may have skewed the results. A larger data set (additional runs) may even that average out.
Conclusion: Is it worth it?
With very little research (about 15 or 20 minutes worth) and about 15 minutes inside the OS we were able to make changes that would speed up application launches (impact perceived performance) by 9 % or so… So yeah! It’s worth it! Changes like these to a gold image that may be used for hundreds and hundreds of desktops are simple to do and don’t take very long. But we (As IT Geeks) should realize that this “tuning” may not get us any more VMs on the host servers or decrease the CPU or memory footprint significantly and instead are about perceived performance from the user perspective. Also, any benefits you may see in RAM usage is limited as common/duplicated services will often fall prey to the memory sharing processes of ESX and therefore will show little benefit of being removed from the system.
But Ron, what about other things? Like indexing and defrag, etc? Those could kill CPU when the user is not even online! You bet they could. And I think it is best to disable those types of services as a preventative measure. But all these other services are small and generally eat up less than a % of CPU and a MB or 4 of memory, and will show little impact to your scalability even if they do help in response time.
Bottom line? It seems to have helped perceived performance, but seems to have no impact on the number of VMs per host with the exception of things like the indexing services eating CPU when people are not even using the system (not trivial).