Recently I've been working on failover and high availability planning for my production servers. As I thought through all of the options I started wondering how most people handle this in their production environments. Specifically, I'm interested in what happens when a standard (or "member") MetaFrame Presentation Server fails. How do you deal with it?
There are really only a few options:
- Restore from backup
- Manual reinstall
- Repeat the original installation process
- Break a mirror from another server
- Mirroring software ??
I guess the most obvious / easiest way to handle this is to restore from backup. Then again, I don't think any of my clients actually back up their Citrix servers (usually on my recommendation). Why not? The Citrix member servers shouldn't contain any user or application data that could be lost, so there's no reason to back them up (well, not all of them anyway). If a server does fail then they can easily restore it using one of these other methods.
Another common way to deal with a crashed server is to wipe the drives and reinstall the operating system and applications from scratch. Depending on your environment, this could take anywhere from a few hours to a few days. One of the nice things about reinstalling from scratch is that it gives you a "pristine" server. Of course the downside (in addition to the time wasted) is that most likely this new server would not be 100% identical to your other existing servers, and that can make troubleshooting and management more difficult.
If you originally used unattended installation scripts or images to deploy your servers, then you could just re-run the original deployment process. You'd still have to manually apply all the changes that you made since the original installation date, but at least you wouldn't have to sit through the basic setup stuff again.
If your failed server has the same hardware as another good server in the silo, you could use the "mirror breaking" deployment method. To do this, just pull a RAID 1 drive out of a good server and put a blank drive in its place. That server will automatically replicate the stuff from the good drive to the new blank drive. In the meantime, you can unplug your new server from the network, pop the broken mirror drive into it, boot Windows, change the computer name, SID, and IP address, reboot, pop in the network cable, and add it back into the domain. (Don't forget to manually delete the old computer account from the domain.) This whole process takes about ten minutes, and you'll have a perfect copy for the new server. The main downside to this is that you have to have another server that's identical and that's stable enough to use as the source image.
The final option you have is new to me, and frankly I'm not sure whether it would work well in a Citrix environment or not. This option involves the use of server data mirroring or replication software. I got to thinking about this when I was working on my BrianMadden.com web servers. I'm implementing NSI's Double-Take product which is software that keeps files (even locked and in use ones) in sync and replicated across multiple servers. In my case I have two SQL servers. All of the SQL database files on my primary server are continuously replicated to a second server. The primary server supports my website. But if it goes down or if I need to perform maintenance, the software fires up the SQL services on the second server and moves the IP address over, and (since the database files are already there) I have my full SQL database back up and running in about three seconds. The software continuously keeps the replicated copies of the software up-to-date.
This got me thinking. Could software like this be used in a Citrix environment? (I mean beyond the data store itself, in which case it's perfect.) My thoughts would be that you would have this mirroring software installed on all of your Citrix servers. If one server failed, you would have a live "image" of it on some other server that you could instantly deploy to new hardware. It might be kind of cool.
On the other hand, you'd have to pay for this imaging software. Double-Take costs $2500 per server, although there are cheaper solutions that don't have quite as many features. (Peer Software, for example, has one that does everything except the automatic failover for $500 per server.)
I'm not sure if this imaging software makes sense in a Citrix environment or not. There are certainly many other ways to be able to quickly recover from a server loss, and if you do have data on each server then why not just use regular backup software? Even in the worst case you should be able to backup one server from each silo that you can restore in the event that one fails.
So, what do you do if you lose one of your Citrix servers?