In a presentation in NYC to analysts last Thursday, Citrix’s CEO Mark Templeton announced “Project Kent,” a set of technologies that Citrix is working on to let users access critical IT applications and facilitate communication after a major disaster.
The Back Story
Disaster recovery has always been challenging for IT. In the early days, the terms “disaster recovery” and “data backup” were often used to mean the same thing, even though they were far from the same. Data backup meant that your data would be safe in the event of a disaster, often with the help of offsite tape storage vaults. The problem with data backup was what happened after a major disaster. If something really bad happened and you lost a datacenter, you could call the tape vault and get all of your tapes back. Great! But then what? Did you have any hardware to restore them to? How long would it take you to rebuild your physical infrastructure to get to a point to where the content on the tapes would actually be useable? What about the users? In the event of a natural disaster or a major fire, it’s possible that the users’ workstations would also have been destroyed. How long would it take to rebuild the workstations and get all of your applications installed?
Obviously this is a perfect scenario for server-based computing and something that Citrix has been playing up for years. If all your applications were running on centralized terminal servers, then you wouldn’t have to worry as much about the end users’ workstations. You could focus on the servers, applications, and data, and then tell your users to connect from home or buy a bunch of thin client devices and have them up in a few hours.
This all sounds great in theory, but there are still some major technical challenges to getting this all working right, even in today’s world. In fact, the disaster recovery module of our training class still ranks as one of the most lively discussion topics because there is no single perfect solution.
The problem is this. For most organizations that are concerned about keeping the doors open during and immediately after a major disaster, the solution has been to build a backup datacenter. This backup datacenter may be just a “cold” location with a bunch of powered off hardware that’s ready to be used when the primary location fails. Or, the backup location may be a full-blown replica of the primary datacenter, complete with continuous data replication and live servers, ready to become the new primary datacenter in a moment’s notice.
In fact, Citrix even added some cool technology into Presentation Server 3 and 4 called “Zone Preference and Failover.” This technology allows you to specify that certain server farm zones should be used for certain groups over others. You can configure your environment so that your users all use Presentation Servers in the primary location, but if for some reason that location is not available, they’ll automatically use servers from another location. Pretty cool!
There’s only one major problem: How do you actually route the users from the primary location to the secondary location?
To understand this problem, think about how Presentation Server actually works. These days, most people use Web Interface (or the Program Neighborhood Agent which in turn uses Web Interface) to access applications and content published via Presentation Server.
Imagine a bunch of users at corporate headquarters. They connect via Web Interface to local Presentation Servers. There is also an off-site datacenter with backup Presentation Servers. What happens in the event of a disaster?
The answer depends on the type of disaster. If there is a major problem in the main datacenter, the backup Presentation Servers should be able to take over, right? Unfortunately, how will the users access those servers? If the Web Interface server is in the main datacenter that’s no longer available, then the users will get HTTP 404 errors when they try to access their applications.
To mitigate this, you can instruct your users to access a backup URL in the event that the primary one is not available, but if this is the case then why bother having the “automatic” failover of your backend Presentation Server environment if your users need to manually fail over anyway?
This anecdote illustrates the unique challenge with “pure” server-side failover plans. If the failover intelligence is purely in the server, and the server is not visible or available to the client, then how can the client receive notice of the failover?
The obvious solution is to put some failover intelligence on the client. One way to do this is to make it so that the first time a client device hits a server, it gets a list of other backup addresses it can use should this current server ever not be available. In fact, this is what Citrix does with their perimeter access device, the Citrix Access Gateway. This makes a lot of sense except for one thing—it means that you have some kind of client software or agent installed on your client device. Uggh!
There are of course other ways to point clients to new servers. A common way to do this is via a DNS change. But this brings its own challenges. DNS updates require time to propagate. And of course you can’t have your DNS servers the same datacenter that you’re trying to protect, because if you lose that datacenter then you’d lose your DNS servers. So this means that you’ll have to use an external DNS server, but doing so introduces another point of failure. In fact, losing those servers could render your primary datacenter inaccessible even if it was fully functional.
So far we’ve only looked at some of the technical problems associated with site-to-site failover in a disaster. Of course the other aspect to this is the human and logistical side of things. In a major disaster, IT staff and users will probably not be able to gain physical access to any of the primary servers, and it’s possible that users would be spread out all over the place. How do people get into their apps? How do they even know whether they should use their apps. How does the IT staff communicate?
This is where Project Kent comes in.
Citrix Project Kent
Even before Project Kent, Citrix has a lot of products that can facilitate application access in a disaster. Presentation Server can ensure that applications can be accessed from random client devices. The Citrix Access Gateway SSL-VPN can ensure that these applications can be accessed securely from outside the firewall. GoToMyPC ensures that users can access their work computers from home.
Project Kent wraps all of these technologies, plus several others, together in a way that really is cool. Phil Winslow, an equities analyst who tracks Citrix for Credit Suisse, attended the briefing last week. (I was not invited.) Here’s his summary of Project Kent:
Enterprises will locate Project Kent technology in all key locations, which will most likely be integrated into the Citrix Access Gateway. The appliance will function as a "Business Continuity Manager," performing the following roles: Emergency Portal, Alert Server, SMS notifications, Roll Call, USB Key Management, Telephony Redirection, Secure Remote Access, Instant Messaging, etc.
The idea is that an enterprise issues its "Emergency Response Team" (ERT) with red USB keys and its users with black USB keys. In the event of a business continuity or disaster recovery scenario, an ERT will find any PC (Cybercafé, home PC, etc.), insert the red key, which will ask a few questions, and initiate a "state of emergency." This will in turn message (via Blackberry, SMS, phone, etc.) other ERTs that an emergency has been declared. The Business Continuity Manager appliance will now assume a role to coordinate all further activities and will become an enterprise's "Emergency Portal." As users and other ERTs connect via their USB keys, the appliance will provide access, priorities, coordinate activities and facilitate communications, send alerts, conduct roll calls, etc.
Personally, I think this Project Kent thing is really cool. This really shows the value of what can be done by combining a bunch of technologies that Citrix already has together in a cool way. Now, if they can only figure out a way to prevent a rogue red-keyed employee from having some fun...