Ruben and I are getting close to the release of the Project VRC phase 2 white paper. (Here's a primer on Project VRC if you haven't heard of it.) Although we've been discussing and presenting the best practices from the first paper around the world, it's still important to understand Project VRC’s methodology and how the results should be interpreted. This article is a preview of the next VRC whitepaper:
Virtual CPUs (vCPUs)
A Terminal Server is similar to a highway--it's shared by its users. And just as a highway with multiple lanes is more efficient than a single lane, so too is a Terminal Server with multiple vCPUs. Besides the obvious advantage of increasing capacity, adding a second lane means that the impact of an accident or slowdown is greatly reduced as traffic can still get through via a free lane. In other words, having more than one lane means a single slowdown doesn't impact everyone.
This is not fundamentally different with Terminal Server workloads. Configuring each VM with a single vCPU could theoretically be more efficient (or faster). But in the real world of Terminal Server workloads, this is highly undesirable. Such workloads vary greatly in resource utilization and are typified by hundreds or even thousands of threads in a single VM, so a dual vCPU setup gives Terminal Server users have much better protection than a single vCPU against congestion issues.
This means that a minimum of 2 vCPUs per VM were configured for all tests for Project VRC, even when a single vCPU has been proven to be more efficient. This best practice is also valid for practically all real world Terminal Server-based workloads. (This also applies to Terminal Servers running third-party add-ons such as Citrix XenApp.)
Another important best practice which is valid for all tested hypervisors is not to overcommit the total amount of vCPUs in relationship to the available logical processors on the system. For example, on a system with eight logical cores, no more than eight vCPUs should be assigned in total to all VMs running on the host. (Well, technically this is only important when the primary goal is to maximize user density. )
Various tests in phase 1 of Project VRC have proven that overcommitting vCPUs negatively affects performance. This is not completely surprising since overcommiting means that multiple VMs must share individual logical processors which creates additional overhead.
As a result, Project VRC will not overcommit vCPUs in any tests since the maximizing user density is the primary goal. But it's important to understand that overcommitting vCPUs is not prohibited in every case. For example, when the main goal is to maximize the amount of TS/XenApp VM’s (good old fashioned server consolidation), overcommitting vCPUs is no problem at all. (In these cases, however, configuring two vCPUs per VM is still recommended because the “highway” principle still applies.
Transparent Page Sharing
vSphere’s ability to overcommit VM memory and memory de-duplication through transparent page sharing (TPS) is highly useful for the consolidation of many VMs onto a single server. Nevertheless, one of the older Terminal Server best practices floating around the Internet communities was to disable TPS. And in fact Project VRC phase 1 showed that disabling TPS actually improved performance by 5-10%. This makes sense since TPS is made possible via a background process which scans and reallocates memory, consuming a modest amount of CPU in the process.
When it is the primary objective to maximize the amount of users with Terminal Server workloads and there is enough physical memory available, we still recommend disabling TPS. As a result, all Project VRC tests were conducted with TPS disabled, unless stated otherwise.
However, this VRC recommendation should not be understood as an overall recommendation to disable TPS. For instance, when the main goal is to maximize the number of VMs is (which is quite common, e.g. VDI and rather typical server consolidation efforts), TPS can be very helpful and is recommended.
Interpreting Project VRC Results
Lastly, Project VRC uses the product-independent Login Consultants VSI 2.1 benchmark to review, compare, and analyze desktop workloads on TS and VDI solutions. The primary purpose of VSImax is to allow sensible and easy to understand comparisons between different configurations.
The data found within Project VRC is therefore only representative for the VDI & TS workloads. Project VRC results cannot and should never be translated into any other workloads like SQL, IIS, Linux, Unix, Domain Controllers, Network, etc.
Also, the “VSImax” results (the maximum amount of VSI users), should never be directly interpreted as real-world results. The VSI workload has been made as realistic possible, but it always remains a synthetic benchmark with a specific desktop workload. Real world TS and VDI performance is completely dependent on the specific application set and how and when these applications are used. To include specific applications or customize the VSI 2.1 workload, VSI PRO must be used.
It is important to stress that no benchmark or load testing tools can 100% predict the real-world capacity of an environment, even when the specific applications are used. Real-world users are simply not 100% predictable in how and when they use applications.
Do you virtualize TS/XenApp workloads?
Nowadays, there are many reasons to virtualize TS/XenApp workloads, and there are reasons not to virtualize TS/XenApp. Do you agree with the Project VRC best practices? Do you virtualize TS/XenApp? Why (not)? Have you been succesful with it?