This week's conversation on BrianMadden.com about the whether you should use local or SAN-based storage for VDI has been one of the best conversations we've ever had on any topic. What's clear after reading the 50+ comments is that people are roughly falling into two camps:
- Those who believe that IT services delivered from the datacenter—including VDI—should have a high level of reliability, and
- Those who think, "meh... they're just desktops. How reliable do they need to be?"
(At this point I should mention that it's possible to leverage local storage AND have high availability, but that's not that point of today's article.) The conversation centers around how reliable "desktops" need to be. Obviously this is something that varies from company-to-company and depends on many things including why the company is doing VDI in the first place. And while we can't resolve the "How reliable do desktops need to be?" question on BrianMadden.com, we can set the context for how you can answer that question in your own environment.
Whenever the question of desktop reliability comes up, the first thing people point out is that traditional desktops and laptops are not reliable or redundant. From an SLA perspective, traditional desktops only break one at a time. Individual desktops have no redundancy and are actually not that reliable—a typical one might break every 2000 hours (hard drive, fan, memory, whatever). The saving grace of course is that a single desktop failure only takes out a single user at a time.
Compare that to VDI desktops running on servers in your datacenter. Your servers—with redundant drives, power, memory, and fans—might only fail every 200,000 hours instead of every 2000. Even with 40 or 50 users per server, you still have a statistically lower per-user failure rate with server-based desktops, but from a practical sense it's a bigger deal since VDI means you're dealing with 40 users down at once instead of just one. (Not that it really matters in the context of this conversation, but a far bigger cause of failure than hardware is human error—dumb users with traditional desktops and dumb admins with VDI.)
Of course the actual impact of a single server or desktop failure really depends on what it means to be "down." In the perfect world where all user data and personality are on the network, your "downtime" is only as long as it takes to get a new machine in front of the user. Swapping out a laptop might be a few hours for a traditional user, but with VDI, reconnecting to a functional desktop is just a matter of re-clicking the "connect" button.
Just having a few seconds of downtime, though, only works when your data and personality are on the network. And while we all know that the shared-master-with-dynamically-layered-desktop is the long-term goal of VDI, the reality is that most of today's VDI deployments are based on persistent disks, where each user "owns" his or her own disk image. (Right?) And if that persistent image is stored on a single server, and that server goes down, then you're looking at a longer outage period as you move drives, restore, and/or rebuild those users' desktops.
But wait, isn't that the way traditional desktops work? If a users loses a hard drive, then they're down until you restore their data from backup. The amount of time that takes is directly related to your planning for this. Are you backing up laptops? Are you only backing up data that's on the network in their profile and home drive? Are you providing a service like DropBox? Do you back up the entire desktop?
At the end of the day, the money you spend backing up and protecting traditional desktops is directly related to how important your company perceives that desktop to be. And moving the desktops to the datacenter doesn't change that. (Well, again this depends on the reason you're moving your desktops to the datacenter. If it's truly about availability and not cost savings, then yeah, boot them from a SAN and get live migration and everything. But that's not the reason that a lot of people do VDI.)
The level of redundancy you built into your desktop solution—be it traditional or VDI, local or SAN—is a matter of company policy. The technology is merely a tool to deliver on a certain SLA.
So what are your desktop SLAs currently? If a user's laptop fails, what's the agreed upon time to provide a replacement? When that replacement is booted up, what's the expectation of what's on it? Are the apps reinstalled? Is the profile back? Is all the user data there?
Why SLAs matter even more for VDI
One of the interesting "side effects" of VDI (and desktop virtualization in general) is that it forces you to "formalize" all aspects of your desktop environment. I'm willing to bet that a lot of companies don't even have written SLAs when it comes to desktops. Sure, you could estimate what your "approximate" SLA is—within 4 business hours for example—but it's probably not formal.
The reason this is important with VDI is that higher SLAs means higher costs. 24/7 four-hour response is more expensive to deliver than 8/5 four-hour. 99.999% uptime is more expensive than 99.99. We all know that.
The problem with VDI is that there's often a hidden "SLA bump" that happens when you move your desktops from the desktop to the datacenter. (The "bump" is that you move your desktops from an environment with a low SLA to an environment with a high SLA.) You get a bump in SLA without a corresponding plan to bump the costs. This is a problem is because companies don't usually account for it as they're working out the cost models and financial justification for VDI. (You know, if you believe in that kind of thing.) Somehow these companies hope to magically increase the SLA for their desktops while expecting the management costs to be cheaper?!?
What does this mean for you?
In order to avoid falling victim to the unintentional SLA bump, you should first formalize the existing SLA that you have for your desktops. Even if this is something you do on your own, figure out what the expectation is for desktop support and write it down. If you're just in the planning stages for your VDI project, ask around and see what peoples' expectations are around support and uptime. If they're higher for VDI, point out that fact and maybe ask whether there's extra budget for that.
And remember, desktops and servers are not the same. And just moving your desktops into the datacenter doesn't automatically mean that you have to change your desktop SLA. Remember that it's ok for desktop users to be second-class datacenter citizens. If your company wants to upgrade your users to first class, remember that always costs money.