Nick Rintalan and I are just back from another successful BriForum conference where we delivered several sessions. One of the reoccurring themes we have seen at the past several BriForum events as well as other industry events like Synergy is the rise of multiple vendors trying to solve the two major challenges associated with VDI:
- Eliminating the need for Persistent Desktops (solving the application challenge)
- Reducing the IOPS and storage resources required for VDI
One of our sessions at BriForum was entitled “VDI High Availability and Why Persistent Desktops are the Devil”. If you attended BriForum 2014, but did not catch our session, I encourage you to watch the video on BrianMadden.com. If you did not attend Briforum, then you will have to wait until they get posted to YouTube (probably early next year). However, I will try and recap some key points of the session as it relates solving these two challenges.
I am not going to dive too deeply into why these challenges make deploying VDI difficult on a large scale as this has been discussed many times in great detail. There have been quite a few passionate discussions about whether one should do persistent VDI, non-persistent VDI or even just stick with XenApp/RDS desktops. Here are some links to some of those discussions.
The Devil in the Datacenter
As it relates to VDI, the persistent desktop has become the devil in the data center. Here is a quick summary as to why.
Persistent vs Non-Persistent VDI
In an ideal world we should be able to give users a clean gold image of Windows 7 and dynamically layer or deliver their applications and personalization settings on demand. This non-persistent approach significantly reduces cost and complexity and facilitates high-availability. If designed properly, a user could connect to Win7 VM #1 today and then connect to Win7 VM #99,999 tomorrow and it should not matter. Users get the same personalized desktop experience with their applications, their profile settings, and their data no matter which desktop to which they connect.
The challenge is that applications have proved to be difficult. Coming up with one (or maybe a few) gold images with the correct base applications as well as using App-V, XenApp/RDS, ThinApp, etc… to deliver the non-base applications has some challenges. As a result, many companies have not successfully made the transition to non-persistent VDI and instead decide to implement assigned persistent VDI.
Unfortunately, this approach has major draw backs turning it into the devil in the data center.
- Storage Meltdown! Persistent VDI VMs are storage hogs in terms of space as well as IOPS consumed. Windows desktop workloads in their traditional form were not meant to be run in the data center. This means we need to have expensive SSD storage or 3rd party software solutions to deal with the problem. We have SSD, thin provisioning, data deduplication, write coalescing, etc… to try and reduce the storage impact of persistent VDI. Unfortunately, all of these new vendors and their products and technologies are simply treating symptoms caused by persistent VDI and not the real problem, which is persistent VDI!
- CPU Hog! In addition to being a storage hog, persistent VDI is also a CPU hog. When Windows updates are pushed out, they are pushed out individually to each VM. That means when you need to install a Windows or Office service pack and you have 10,000 VMs, you will run that install 10,000 times. That is not only a massive storage hit, but also a CPU hit. Worse yet, can you imagine what happens when you need to roll a hotfix back? You get hit again. Additionally, most companies insist on anti-virus conducting full scheduled scans of all disk drives on a persistent Windows VM. This means extra CPU overhead for this task as well. With non-persistent VDI, you update the gold image once and everyone gets the new version. Also, if you have a read only gold image, you can run a scheduled AV scan once per gold image instead of once per VM and simply configure your non-persistent VMs to scan only on actual read/write activity.
- High Availability. Persistent VDI cripples your ability to provide a highly available solution. With persistent VDI, you are locking a user to a single VM, on a single LUN, on a single SAN, attached to a single VLAN, hosted on a single hypervisor cluster, on a single hypervisor platform, in a single data center, hosted in a single VDI brokering system. If anything goes wrong with any component in the stack, the user cannot access their VM. With non-persistent VDI you can have multiple independent VDI stacks across multiple hypervisor platforms and potentially across multiple data centers and you can dynamically connect the user to any VM and they will still have their personalized desktop. This is the only way to achieve true high availability and scalability when dealing with tens of thousands on VDI VMs.
I could go on and on with myriads of other challenges associated with large scale persistent VDI, but I think you already get the point.
Exercising the Demons
I am happy to say that the technologies exist today to finally get the devil out of the data center! Here’s how you can do it!
The Death of IOPS
There have been numerous improvements over the last year or two in getting control over the IOPS generated by VDI. Some these solutions focus on using fast SSD storage so that the storage can actually handle the IOPS. There are many solutions out on the market today that address this with hardware such as Whiptail, Pure Storage, XtremeIO, Tintri, etc… You can take your pick as they all do very similar things and offer great performance. Traditionally, these have been expensive solutions; however, the price has begun to come down on these hardware solutions and they will only continue to get better and cheaper.
There are also software based solutions that can solve the IOPS challenge today as well. With the ever decreasing cost of RAM and the increasing RAM density in servers today, it makes sense to use existing RAM within each Hypervisor host to solve the IOPS challenge. Atlantis ILIO has a fantastic solution that has been on the market for several years now where they address the IOPS challenge with a pure software solution using resources within each host. Additionally, with the release of Citrix Provisioning Server 7.1 (released with XenDesktop 7.1 and later) we have a new feature called RAM Cache with Disk Overflow. This feature basically solves all of the IOPS performance issues associated with non-persistent VDI with just a small amount of RAM and a little bit of in-guest write coalescing. I encourage you to check out the following links for details on both the ILIO and PVS software based solutions:
So with purchase of Flash/SSD hardware or with a software solution that leverages existing hypervisor resources, we can completely solve the IOPS challenge today in a cost effective manner. However, in order to make the most of some of these IOPS killing features and to get maximum scalability and availability when we start talking about tens of thousands of VDI VMs, we need to commit to using non-persistent VDI. So let’s show you how you can make that a reality!
Replacing Persistent VDI with Non-Persistent VDI
After 16+ years of helping hundreds of customers across every industry around the world deploy Citrix End User Computing (EUC) systems, I can honestly say that the requirements of 80% of all users could easily be met with either a non-persistent XenApp/RDS session or a non-persistent Windows desktop. All it takes to make this a reality is proper use of the following three strategies.
1. Know your users and their applications.
This is the most important and most overlooked item. It seems every customer gets stuck in “analysis paralysis” and simply does not know how to remediate their applications. They end up doing a basic audit of what is installed and come up with some crazy number of 1000+ unique applications. They see this massive list and simply have no idea how to start tackling it. I have helped many customers through this process and invariably what we find is that 80% of their users really only use about 20% (or less) of the total number of applications. Making VDI and thin client computing a reality for 80% of your users does not require remediation of all of the applications. You need to spend the time up front actually meeting, speaking and physically working with your users to figure out what is truly needed. You will often find that simply remediating an additional 10 – 30 applications will often get you to the point where 80% of your users can be fully serviced by a non-persistent desktop.
You should start with the easy users like secretaries, warehouse workers, office management, support staff, etc… However, every customer wants to start VDI by picking their most difficult set of users. Instead of starting with basic office/knowledge workers or task workers, they pick their engineers, IT developers, financial traders, traveling sales people, executives, etc… These high end, high profile users often account for less than 20% of the actual user base, but they are the ones that are most demanding and have the broadest application set. Don’t worry about the 20%. They can stay on persistent VDI or stay on fat laptops/desktops. If you can get 80% of your general workforce on non-persistent VDI or an RDS Desktop, that is a major win and worth moving forward!
2. Use existing tools and methods to deal with applications.
We have many tools and methods for delivering applications dynamically into non-persistent desktops. Some of these tools include App-V, Thin App, XenApp/RDS, etc… These are proven tools that work great in a non-persistent environment. These tools are not perfect and do not address every requirement; however, remember that you do not need to remediate all applications. You are trying to remediate just enough applications so that you can get the requirements of 80% of your users met. You will often find that with the right combination of placing applications into the base image, virtualizing apps with App-V/Thin App and hosting certain applications on XenApp/RDS, you can meet the requirements for 80% of your users.
Additionally, it is perfectly acceptable to have more than one gold image. Everyone wants to manage one gold imagine, but is it really that difficult to manage an extra two, three or four images? If you can’t manage three of four gold images, then IT might not be the right career for you. I would caution against going crazy with creating multiple images. While it is OK to manage a few additional images, you don’t want to end up with 100 different gold images.
3. Implement a User/Admin Installed Application Technology.
If all else fails, there are other technologies on the market that let you dynamically attach a virtual disk to a non-persistent VM so that applications can be installed either by the user or by an administrator. Citrix Personal vDisk (PVD) was an example of such a technology; however, it has a fatal flaw as it mounts the user’s vDisk at system boot instead of user logon. This means that with Citrix PVD, you must still assign the user to a specific VM because their personal vDisk is physically attached to only one VM. For this reason, I cannot recommend using PVD.
However, there are other solutions that will dynamically attach a user’s personal disk at logon instead of at system boot. This type of solution is much more flexible because it allows the user to logon to any non-persistent VM, but still get their personal applications. One example of such a technology that I would recommend is CloudVolumes 2.0. With CloudVolumes 2.0, you can give the user a VHD file that lives on a CIFS share and when the user logs on, it will dynamically mount the VHD file instantly merging/blending the C: drive of the non-persistent VM with that of the VHD file in the CIFS share, thus providing the user with everything available within the base gold image as well as any one-off or other applications that were installed into the user’s personal VHD file. VHD mounting from CIFS (especially with SMB 2.1+) is a fast efficient method that fully leverages all the native guest and file services caching mechanisms. It works beautifully and scales well! Citrix had a similar VHD mounting technology with our Citrix Application Streaming feature of XenApp 6.5 and it would have been easy for us to implement user installed apps with our Application Streaming technology. Unfortunately, we killed Application Streaming, so now you need a third party solution like CloudVolumes. I encourage you to check them out. Here is a link to a great webinar about CloudVolumes.
In addition to CloudVolumes there are other 3rd party approaches to handling the application integration challenge. Another new company that comes to mind is FSLogix. Instead of giving each user their own VHD, they have a unique solution that allows all applications to simultaneously be placed into a single gold image. The applications that are visible from within the gold image are controlled based upon user/group permissions. I encourage you to check them out as well.
If the above strategies are leveraged, most organizations will be able to address 99% of the barriers that forced them to implement persistent VDI! Combining IOPS killing features of new SSD hardware or software such as PVS 7.1 and Atlantis ILIO as well as proven application remediation strategies along with technology such as that of CloudVolumes or FSLogix, you can deliver fully functional, highly available, cost effective non-persistent VDI for the majority of your users today!
I called out a few specific products that today can solve the major barrier to VDI adoption, but the reality is that this style of VDI delivery will become the standard solution for delivering VDI and will eventually be available across all vendor platforms. I fully expect (and as consumers you should demand) that the storage capabilities of PVS and ILIO become standard features of every major hypervisor. We can already see the hypervisor vendors moving in this direction; VMware with VSAN and Content Based Read Cache (CBRC) as well as Microsoft with CSV Caching and SMB 3.0 shares are getting close, but still not quite there. It will not be hard for them to solve this challenge and I think they will. Here is what they can (and hopefully) will do.
VMware simply needs to add two new features as follows:
- Write Caching. As part of the VMware Tools or as a virtual hardware device managed by the hypervisor, VMware could easily cache and coalesce write operations for non-persistent VMs. Can you picture a virtual write cache enabled RAID card with configurable memory amount that could be assigned to each VM? That would be great!
- Shared NFS Datastores. What we need is a read-only NFS datastore that can be used to store gold images. This datastore should be able to be globally shared across multiple vCenter and Cluster instances. This should not be that hard to do. This would allow you to drop a VMDK file or update a gold VMDK file in an NFS volume that is shared by many, many separate vCenters. Since it is read only, all of the VMs cloned off this gold image in the shared NFS datastore would put their differential disks on local or shared storage that is unique per VMware cluster. This would simplify the process for rolling out new images across large VDI environments that have multiple vCenters and clusters
You combine these new theoretical features above with CBRC and VSAN and you have a killer non-persistent VDI system.
Microsoft has introduced two new features that are helping them move in the right direction as well. They have Hyper-V caching of disk operations from Cluster Shared Volumes (CSV) and they have the ability to host VHDs on SMB 3.0 shares. The CSV read caching is similar to VMware’s CBRC technology. However, this is not currently enough. What Microsoft needs is basically same capabilities that VMware is currently lacking.
- Write Caching. Microsoft needs to implement a write caching and coalescing driver for their non-persistent VMs just like I described for VMware above.
- Hyper-V SMB 3.0 Caching. If Microsoft would enable Hyper-V to cache reads from SMB 3.0 shares and would allow multiple SCVMM instances to read a gold image from the same SMB 3.0 share, they would have something similar to the shared NFS datastores I described above for VMware.
XenServer tried to fix this issue years ago with a feature called IntelliCache. Unfortunately, IntelliCache was a major dud due to it requiring the placement of SSD drives in each server. This hardware requirement as well as the need to use Citrix Machine Creation Services (MCS) is why I never recommended or endorsed this feature. The reason IntelliCache is less than stellar is due to the limitations of a 32-bit Dom0. This major Achilles heel in XenServer is why memory cannot be used to solve this problem. I believe there is light on the horizon for XenServer as there are plans to finally revamp Dom0 and make it 64-bit. Once that happens XenServer will also be able to add the same native enhancements that I listed for both VMware and Hyper-V.
User/Admin Installed Apps
While new technologies from companies such as CloudVolumes and FSLogix do the best job toady of providing the ability to dynamically deliver applications into a non-persistent VM, I fully expect other vendors to mature and offer this type of technology as well. It will not be long before such capabilities ship as a core feature of Microsoft, Citrix, VMware or some other vendor technology.
In summary I think the future of VDI is bright and 2014 is the year that the technologies are finally mature enough to deliver large scale non-persistent VDI to tens of thousands of users in a fast, efficient and cost effective manner that addresses the majority of all user requirements.
I would love to hear your thoughts and I wish you success as you look to leverage VDI as a tool for your business!