Last week I gave a breakout session at Citrix Synergy’s Virtualization Congress called "The REAL cost of VDI." This was not about the cost of losing your job if you built a bad VDI environment; rather, it was about the hidden costs of VDI that many people don’t consider until they’re like, “Oh shit” during the middle of their deployments.
Before we jump into this, I want to point out once again that I like VDI where it makes sense. (Watch this video of me presenting “TS versus VDI” to understand where each makes most sense.) I should also point out that for this entire article, I’m talking about the VDI flavor of desktop virtualization, which is server-based computing with users connecting to single-user VMs running in a datacenter. I’m sure there are hidden costs in the other flavors of desktop virtualization too, it’s just that most of those are too new for us to understand the full cost structures yet.
A quick note about cost models
The purpose of this article is to discuss some of the unexpected expenses that crop up in VDI projects. It’s NOT about how to perform your own TCO or ROI analysis. While you can factor some of these things into your own models, it's really easy to lie or mislead with cost models. I'm not saying that cost all models are bad; I'm just saying that you can make them show whatever you want. (There’s a great book by Gerald Everett Jones called “How to Lie with Charts.” I’d love to write one some day called “How do lie with cost models.”) [December 2009 update: I did write this article! :)]
As those of you who’ve been reading this site for awhile know, VDI is just a flavor of server-based computing (SBC), just like Terminal Server. So when we’re thinking about the hidden costs of VDI, we can actually break those costs into two categories:
- The hidden costs you find in any type of server-based computing
- The hidden costs you find only in the VDI-type of server-based computing
Let’s first take a look at the hidden costs that we find in all flavors of server-based computing. This is the stuff that’s well known to old-school Terminal Server or Citrix engineers.
The hidden costs of server-based computing
Not being able to get rid of legacy systems
A lot of people implement server-based computing to save money. If you're thinking about doing this, it's important that you figure out if your new SBC system will entirely replace an existing system or if it will be in addition to an existing system.
For example, if you can remove every single fat client from your environment and go 100% SBC, then I think yes, there are huge savings there.
But if your server-based computing system can only replace 80% of your apps, then that means you still have to maintain your old system for the other 20%. That means you need your old patching system, app deployment system, etc. And in that case, even though the new server-based computing system is easier to manage than your old system, it actually ends up giving you negative ROI because it's a whole system you have to implement in addition to your old system. It's just that much more stuff to break.
Changes to user paradigms
Server-based computing offers a lot of advantages over traditional computing. Unfortunately a lot of the stuff that's really cool to us as administrators can sometimes confuse the users. And confused users lead to more helpdesk calls which cost money.
A great example is Citrix's SmartAccess capabilities that are integrated in their Advanced Access Control product. This set of technologies is amazing, and I've written about how awesome they are on several occasions.
Unfortunately they're also a brilliant way to confuse the heck out of your users.
For example, these technologies have the ability to scan a client machine and then change the way an application behaves (or even hide entire applications) based on the results of that scan. Good for us admins! But imagine this from our poor users' standpoint, where sometimes cut-and-paste works and sometimes it doesn't, or sometimes they see an app and sometimes they don't. The cost of supporting these users and their new technology is very real. And the problem is getting worse as the SBC vendors do more and more to “hide” the fact that users are using remote apps.
Thinking things will scale linearly
Since all SBC solutions (TS and VDI) consolidate user execution in the data center, we need to have a good understanding of hardware requirements and scalability before we buy hardware or even get approval for the whole environment. This leads to one of the oldest hidden costs there is, which is where we do a test and find that we can support x users per server, so we think that we can automatically support nx users on n/x servers.
And anyone who’s ever deployed Citrix knows it doesn’t quite work that way. ;) It' seems like there’s always some bottleneck that we don’t find until we’re working on the project that has to be addressed. So when we’re modeling this stuff, we need to think about all the “other” stuff that we need to scale up as we add users. Think about networking, backup capacity, disk bandwidth, domain controllers, etc.
The hidden costs of VDI
Even though the VDI version of SBC is a lot newer than the TS version of SBC, we still have a pretty good understanding of where the big hidden costs are.
Hidden cost #1: Using VDI
If you choose to use server-based computing, you need to understand that VDI is more expensive than Terminal Server. Period. This is something that everyone agrees on, including Citrix, Microsoft, VMware, and Brian.
By the way, in case you’re wondering “Why would anyone use VDI if it’s more expensive than TS?”, people choose VDI because it has features that they can’t get in TS. But these features come at a price; namely, money. :) VDI is more expensive than TS. And that’s ok, because IT is not about implementing the cheapest solution—it’s about implementing the cheapest solution that meets a business’s needs.
The reason I list this as a "hidden" cost is because a lot of people end up using VDI in scenarios where Terminal Server would work fine. This usually happens because they only compare VDI to their traditional environment, and they don’t even consider TS-based solutions.
Think about how much disk space all those copies of Windows running on user desktops consume in your environment. That's got to be what, 20GB per user? Now imagine if you implement VDI. You take 20GB per user and move that from cheap throw-away storage on your desktops to expensive SAN-type storage in your datacenter. That's a crazy cost!
Of course in today's environment, most people don't actually have all those copies of Windows stored in the datacenter. Today’s VDI deployments typically have data deduping in the SAN, or they use one of the “thin provisioning” solutions like Citrix Provisioning Server or VMware View Composer with Linked Clones.
The weird irony of these thin provisioning solutions is that they only make sense when all your users will use the same disk image. But if all your users can share a similar disk image, then why are you using VDI in the first place? Isn’t that what TS is for? And if your users each need their own totally custom disk, then you still have to manage, store, and back all that stuff up.
Another aspect of storage is the storage bandwidth between the VDI VM hosts and the storage locations. If you’re using Citrix Provisioning Server, you better ask around and find out what the server-to-VDI user ratio is. Same for View Composer with Linked Clones for VMware. Ask them how many VMs you can get per LUN.
Windows Licensing (VECD)
Remember that every VDI environment needs a VECD license, and that’s going to cost you $23 per device per year in addition to your SA license fees. (VECD jumps to $110 per device per year if you don’t have SA.) While that cost is about the same as a TS CAL (it’s not a hidden cost in the context of “TS versus VDI”), I represents a completely new costs if you’re jumping to VDI after having done things the “old” way for the past several years.
Complexity of the unknown
The reality is that VDI is still pretty new. The exact estimates vary, but there are probably something around 1 million VDI users in the world, versus about 100 million TS users. For TS-based projects, there are books, forums, white papers, articles… everything. For VDI, there’s…. well… there’s a lot less. Even for me, I feel like every time I see a VDI project I’m learning as I go along. Contrast that to TS projects that most of us could do in our sleep.
Again, remember this article is focused on the hidden costs of VDI. I’m not saying that exploring the new unknown is bad. (In fact I think it’s pretty good and cool!) It’s just that you need to understand that this new unknown exploration will cost more than if it were a known thing. (The cost of the unknown is mostly in wasted time looking for solutions and figuring stuff out—both during the project and in support afterwards.)
Not thinking about non-compatible apps
Most VDI cost models are based around multiple users sharing the same base disk images, and the idea is the most of these will be customized on-the-fly by having applications inserted on-demand with application virtualization technologies like App-V, ThinApp, XenApp streaming, InstallFree, etc. The problem is that these app virtualization products aren’t compatible with all apps, meaning that you’ll have to figure out some way to deal with your “other” apps that won’t work here. How do you do that? Do you deploy those apps to local workstations? Do you install them into the base Windows image for the VM? If you do that, how do you regression test them against each other? Or do you build multiple images?
All of this adds complexity that a lot of people don’t think about when their building the cost models for their VDI environments.
Not knowing Windows XP well enough
If you thought you knew Windows server before TS, I guarantee that after your first big TS project, you’ll really know Windows! There’s just so much stuff to know, like how SBC handles multiple users to kernel thread quanta scheduling to how the print router priorities different print jobs to how the userenv.dll loads roaming profiles. And all that knowledge is based on what, fifteen years of knowing Terminal Server?
Now imagine taking something like Windows XP that ordinarily runs on a bunch of desktops out in the office, and bringing that into your datacenter. You need to “datacenter-ize” your copies of XP and really (I mean REALLY) understand how they work (and how your hypervisor deals with scheduling and memory and I/O access and everything). And there are a lot of “gotchas” here that really aren’t that well known.
For instance, did you know that Windows XP will automatically defrag the important system files once it’s been booted three times? Guess what that does to your preciously tiny “delta” disk image files if you happen to have deployed a master disk image before that process kicked off? And while we’re on the topic of disk imaging, does anyone know what exactly should be included in a disk image, and what shouldn’t?
There are about 1,000 questions like that that the industry is just now figuring out, and the getting something like this wrong can cost you big time (either in time spent troubleshooting or wasted money buying too much hardware to cover for the poor performance).
Vendor products that change too fast
Citrix had a monopoly in the TS-based SBC space for over ten years. One of the nice things about a monopoly is the nice slow pace of product development and updates. That means that you don’t have to learn too fast and you don’t have to change your environment too fast.
Compare that to the blazing pace of development of VDI products. Both VMware and Citrix released two major products each in the past twelve months. These things are changing so fast—you just can’t keep up. And fast changes means more time spent learning and studying which leads to less time actually doing your job which leads to a higher cost.
These fast-paced changes also mean that you’ll end up changing and upgrading your system more often—another hidden cost.
Not knowing which vendor is going to “win”
This is only a problem if you pick the wrong vendor. :)
But it is a serious point. There are so many VDI vendors in the space today, and the space is too new to know who’s going to “win.” So if you buy a product from one vendor, and then in another few years the space becomes dominated by someone else, that means you’ll have the huge expense of either (1) migrating to the popular vendor, or (2) trying to support and ever-more obscure product.
I’m sure there are more hidden costs out there, but this is certainly a starting point of things to think about. Even if you don’t agree with every one of these points, at least now you can have a list and then check these off one-by-one for your environment instead of being blind-sided and thinking “I wish someone had told me about x.”
So until you’ve done this a few times, add 15% “wastage” budget item to every VDI project you touch (for both CAPEX and OPEX).