When people started building VDI environments, they approached the storage design the same way they had always done with server environments: VDI vendor lab tests gave some indication about the required IOPS and all was well. How wrong they were!
As it turned out, the VDI requirements for storage are a world away from server infrastructure requirements. Everything is different as the first VDI projects soon discovered.
Where server infrastructures are pretty constant in their load on the infrastructure, VDI environments are very wispy in their behavior. Ofcourse there are exceptions but in general, server infrastructures are higher in their demand for storage space than they are for performance while VDI infrastructures are exactly the other way around. Be aware that this doesn’t mean that server infrastructures don’t require performance considerations. Especially with disks becoming larger and larger while the IOPS per disk stay the same, the GB/IO ratio is something to consider in any environment.
But not only was the demand for IOPS a problem, the Read/Write ratio of VDI infrastructure were even worse. Instead of 70/30% R/W% that lab tests had shown, real world scenarios showed a 15/85% R/W% instead. And since all storage infrastructures to that day were space centric, most were set up with RAID 5 volumes. With the 4x write penalty of RAID 5, the amount of IOPS fitting on the volumes was much lower than what they intended to deliver had the VDI workload been read intensive. This means that to deliver the required IOPS (which turned out to be mainly writes), more disks were needed. To add those to the budget after the project was well underway makes stakeholders very unhappy.
Unfortunately, that’s not the end of the story. Even when IOPS are under control and the amount of spindles has been calculated correctly, still VDI projected failed. It turns out that not only the amount of IOPS and their read/write ratio was important, also the block size, latency, and many other aspects were exposing new bottlenecks. And with that, the way the different storage solutions handle different types of workloads, showed once again that not all storage is created equal and not all VDI deployment models are equal. They all have their specific strengths and weaknesses and require different sizing considerations. VDI projects have been around for more than 5 years now. Luckily there’s a trend to be seen with 9 out of 10 (or perhaps 8 out of 10) projects regarding the load they generate on storage. If there’s no information on which to base the sizing, these trends are a good starting point.
The'whitepaper 'Storage design guidelines, sizing storage for VDI' is the result of months of hard work of both my colleagues Herco van Brug and Marcel kleine. They have written the extensive whitepaper which can be downloaded here. This whitepaper will give you a lot new insights in 'design guidelines and sizing storage for VDI'. Recommended reading for everyone consulting, designing or supporting VDI solutions.
A typical VDI profile is heavy on writes (85% of all IOPS) and uses mostly 4kB blocks. Other blocks also occur and make it so the average block size is around 10-12kB. Because of this typical VDI profile nature of almost 100% randomness, the storage cache has a hard time making sense of the streams and usually gets completely bypassed leading to less than a few percent of cache hits.
If nothing is known about a client’s infrastructure, this may be a good starting point to prognosticate a new VDI infrastructure. The problem is that a growing percentage of projects don’t adhere to these trends. Some environments have applications that use all resources of the local desktops. When those are virtualized, they will kill performance.
Sizing for server backends can be just as hard as for VDI. The difference though is that with servers, the first parameter is the size of the dataset they manage. By sizing that correctly (RAID overhead, growth, required free space, et cetera), the number of hard disks to deliver the needed storage space generally provide more IOPS than the servers need. That’s why a lot of storage monitoring focuses on storage space and not on storage performance. Only when performance degrades do people start to look at what the storage is actually doing. Comparing infrastructures is a dangerous thing. But there is some similarity between the various office automation infrastructures. They tend to have a 60-70% read workload that has a block size of around 40kB. Because there is some sequentiallity to the data streams the cache hits of storage solutions average around 30%.
As with VDI, these numbers mean nothing for any specific infrastructure but give some indication of what a storage solution may need to deliver if nothing else is known.
No matter the workload, there will always be peaks in the workload that transcend the intended performance maximums. But peaks are rarely a problem. If they dissipate as quickly as they appear, nobody will complain, not even VDI users. It’s only when peaks take longer to flow down that latencies are rising. This has everything to do with caches. Peaks arrive in caches first and when the backend handles them in time, all is well. But when the system is fully loaded and peak loads come in, it takes longer for the cache to drain away to disks.
The following sample graph illustrates this.
At point 2 and 3 the cache handles the peak load that appears to be well above maximum IO level of the disks (green line). But at point 1, the load is too high for the disks to handle, leading to delayed responses and increased latency. In VDI infrastructures, this leads to immediate dissatisfaction of users. This is also why storage is so very important in VDI solutions. To keep users happy, those delayed peaks are unacceptable. Therefore, the infrastructure can’t be sized with just the minimum amount of disks. Any peak would simply be a problem. That’s why VDI solutions are sized at peak averages, meaning that at least 95% of all (write) peaks fall within the specs that the disks can handle, excluding caches completely.
Boot storms, logon storms, and steady state
Everybody that has been dealing with VDI the last few years, knows that VDI is heavy on writes. Yet, when VDI projects first started, white papers stated the contrary, predicting a 90% read infrastructure.
The main reason for this was that they were looking at a single client. When they measured a single VDI image they observed a picture somewhat like this:
The graph above shows the IO load of a single VDI desktop (as seen on the VDI desktop). On the left there’s a high, read IO load (up to several 100s of IOPS) of the boot of the OS. Then the user logs in and a 60% read at about 50 IOPS is seen. Then the user starts working with about 5-10 IOPS at 85% writes for the rest of the day.
Logon storms are often considered problematic. In reality though, only 3% to sometimes 5% of all users log on simultaneously. That means that the heavy reads of those 3% of users somewhat dissipate against the normal (steady state) operation of everybody else that uses the infrastructure. In practice, the peak logon at 9 a.m. observed in most VDI infrastructures uses around 25% extra IOPS, still in the same 15/85% R/W ratio!
That’s why VDI guidelines state that for light, medium and heavy users, the IOs needed are 4, 8 and 12 respectively, and an additional 25% for peak logon times.
Sizing for 1000 users at 80% light and 20% heavy would require the storage infrastructure to deliver:
1000 * (80% * 4 + 20% * 12) * 125% = 7000 IOPS at 15% reads and 85% writes. And that’s it.
Logon storms induce the 25% extra load because of the low concurrency of logons, and boot storms are taking place outside of office hours. The rest of the day is just steady state use of all the users.
The next factor is boot storms. This introduces a whole new, very heavy read load to the storage although read caches on the host may greatly reduce the reads that need to come from the array and Citrix PVS can eliminate all reads at boot by caching them for all desktops in the PVS server’s memory.
When done during office hours, the boot storm has to take place in cache or it will become unmanageable. If possible, it is best practice to keep the boots outside office hours. That also means that cool features like resetting desktops at user logoff, have to be used sparsely. It’s recommended to leave the desktop running for other users and only reboot at the night. Even with local read cache on hosts, the boot that takes place during office hours will not be precached and will flush the cache instead. Only when all desktops boot at once can these caches actually help. So if boot storms need to be handled during office hours (consider 24-hour businesses), it’s better to have them all boot at once rather than have them all boot in turns. This will increase processor and memory usage but the performance impact on storage will not just slow it down, it may actually render the storage infrastructure completely unusable.
But in most environments, it is best practice to boot all machines before the users log in. This means that the OS doesn’t have to boot in 30 seconds but it can take much longer, as long as it’s done by the time the first user arrives. Be aware that it’s also best practice to avoid booting after a user logs off. In environments where users log in and out of different end points, their desktops preferably stay on in between. If a reboot is performed each time a user logs off, the strain on the storage would be much higher. Try to avoid boots during office hours.
It's not about IOPS
But it’s not all about IOPS. There are several other factors that need to be addressed when designing a storage solution.
While all focus on VDI projects seem to go to the delivery of as many IOPS as possible, the virtual desktops also have to land somewhere on disk. When using spindles to deliver IOPS, the factor IO/GB of a rotating disks is usually low enough that space isn’t an issue. E.g. with 1000 users, needing 10 IOPS per user means that at 80 IOPS per disk in a RAID 1 set, a storage solution would need 126 disks. With 300GB / disk in RAID 1 that system would give almost 19TB of storage space or 19GB per user. In most VDI infrastructures that’s far more than (non-persistent) user desktops actually need.
To get a feel for the amount of storage that a user desktop needs, the following breakdown has been created.
The User, System and Hypervisor layer each need a specific amount of space. The following picture gives an overview of the sizing of the three different layers. This is somewhat of a worst-case scenario since not all solutions require a 3D footprint or a spooler that should be able to print 100 pages of a PDF file. Some non-persistent deployments show that the user and system buffer is hardly used, the desktops use reservations and policies and application deployments don’t apply. In that case the desktops could fit in just under 3GB a piece. Also, with user data put on CIFS shares (folder redirection), the amount of data per desktop can decrease too. But in general, the space needed to store desktops is built from the following components:
That means that for 1000 users, at 6.7GB each, the storage system needs 7TB of space. Hosting that in SLC flash disks of 200GB, even in RAID 5, would mean the storage system needs 48 SSDs. With a practical 20000 IOPS per disk, it will never run out of IOPS, provided that the backplane of the storage solution can handle those 48 disks. Just adding SSD’s to an array is dangerous though. With the heavy write workload and no intelligence to coalesce IOs other than the controller does on the SSDs themselves, the life expectancy of those SSDs can be quite low. The whitepaper 'Spinning out of Control' explains this in more detail.
It’s important to note where the actually IOPS is happening. While the boot and logon storms are mainly read from and written to the C: drive, the steady state shows a different picture. Only a few percent of the IOPS still takes place on the C: drive and the rest of the IOPS are divided equally between the user’s profile and his or her applications.
It may seem to make sense to use some form of auto tiering. This would effectively combine the IOPS of the SSD’s with the space of HD’s. Unfortunately, all solutions have a scheduled auto tiering principle, meaning that once a day they evaluate hot blocks and move them (usually in large blocks of multiple dozens of MB’s) to and from SSD’s and HD’s. But when cloning all the desktops at night, all blocks are hot and need to land in SSD and before they get evaluated, they’re already cleaned up. Solutions that do this inline in either the broker software or the storage don’t exist yet (H1-2013).
There is much more to tell about 'Storage design guidelines, sizing storage for VDI'. Get a head start! Download our complete, in-depth, and independent whitepaper. We try to provide accurate, clear, complete and usable information. We appreciate your feedback!