This article is Part 2 of a series. I highly recommend that you read Part 1 first.
Building a Network Test Lab
Before you can even begin trying to design and implement a solution, you’ll need to configure a test lab. This is one of the areas where it really doesn’t make sense to try to make these changes on your production network with live users. (Actually, some of these changes are impossible to test on a production network.)
Building a test lab to test the user and design of servers is easy. Just grab some old hardware or a copy of VMware and off you go. But how do you build a test lab that will be relevant for your network troubleshooting and performance tuning? You’ll need to build something that can simulate the low bandwidth, latency, and other issues of your production network.
Fortunately there are several ways you can go about this. The easiest and most effective way is with the network simulation products from a company called “Shunra.” Shunra has hardware- and software-based solutions that you can use to simulate your real network’s bandwidth, latency, packet loss, jitter, and low MTU sizes.
Shunra’s Virtual Enterprise product comes with a Visio-based tool that you can use to model your entire production network. Then you hook up their hardware box to the different segments of your test lab. They even have a tool that records up to 30 days of “live” network performance metrics from your production network that will then automatically make your test enterprise act like your production one.
For testing single network segments, Shunra also has a Virtual Network line of software-based products that do basically the same thing on a smaller scale.
Advantages of Building a Test Lab with Shunra Tools
- Products simulate your entire environment, including all traffic and all protocols
- Easy to use with GUI configuration
- Can use real network data in the lab simulations
- Virtual Enterprise product can simulate multiple segments, so you can test the actual end-to-end multi-site environment
Disadvantages of Building a Test Lab with Shunra Tools
- Since they’re commercial products, you have to buy them.
If you don’t have the budget to buy the Shunra products there are a few other options. There is an open source product called “DummyNet.” DummyNet is a FreeBSD application that you install onto a computer (running FreeBSD) that has two network cards in it. You can configure it for bandwidth limits and latency and then hook it up in between your clients and servers.
Advantages of DummeyNet
- It’s free
- It’s open source, so in theory you can modify it to suit your needs
Disadvantages of DummeyNet
- It only runs on FreeBSD, so you have to know something about that to get it working (although they do have a bootable floppy disk image version)
- It requires a separate computer to use it
- No GUI configuration
- All configuration must be done manually (it does not use “real” data)
Finally, if you’re using Citrix MetaFrame, there is a free tool available from Citrix called the “SMC Console.” The SMC Console is actually a sample applications that is supposed to show you the kind of cool things you can do with the Citrix Session Monitoring and Control SDK. The SMC Console is part of the MetaFrame Presentation Server SDK 2.3 and newer. This SDK is available as a free download from the Citrix Developer Network (cdn.citrix.com). You have to login to download the SDK, but registration is free.
Once you download the SDK, install it and you’ll find the SMC Console as one of the sample applications. (Before version 3.0 of the SDK, the SMC component was an option, so be sure you select that if you want to use the SMC Console.)
Install the SMC console onto a Citrix MetaFrame server and fire it up. The main screen has a dropdown list that lets you to pick which of the current ICA sessions you want to work with. It also shows you all sorts of statistics about that session.
There is a tab called “Display Options” with sliders that you can use to add latency and limit the bandwidth of a particular session.
Advantages of the SMC Console
- It’s free
- Since it’s a sample SDK application, it comes with all the source code
Disadvantages of the SMC Console
- It’s a manual configuration only
- It only works if you’re using MetaFrame
- It only controls ICA traffic and nothing else on the network
How to Address Each Networking Issue
Now that you have a lab where you can experiment and test out your solutions you can start to think about what changes you’ll make to address each of the issues that are affecting you. Let’s go through the list one-by-one again, starting with bandwidth.
If you think that not having enough bandwidth is an issue in your environment, there are a few things you can experiment with:
- Increase bandwidth
- Do something to ICA or RDP to make it take less bandwidth
- Work with the network people to increase the efficiency of the network itself
Of course the easiest solution is just to add more bandwidth, right? The problem with this is that it doesn’t really address the core issue. Also, if your users are using all of their bandwidth today, what’s to say that they won’t still use all of the bandwidth even after you increase it?
Of course there’s always the chance that you might legitimately have to increase bandwidth. (Maybe you’re trying to support 50 remote users over a single dial-up line.) However, adding bandwidth always costs more money, so it’s something you should only do after you’ve tried the other things outlined here.
Limiting the amount of ICA or RDP data on the network
Instead of trying to keep throwing more bandwidth at the problem, why not attack the source? You can do things to the ICA or RDP protocol to minimize the amount of bandwidth it consumes, including:
- Bitmap caching
- Queuing keystrokes and mouse data
- Disabling unnecessary virtual channels
- Placing bandwidth limits on certain virtual channels
- Changing session parameters to minimize bandwidth requirements
No of these items is going to make a drastic change in bandwidth consumption, but considering each individually and applying them all as appropriate in your environment can add up to a fairly hefty difference.
Both the ICA and RDP protocols compress their data to minimize the amount of data that needs to traverse the network. This compression can be as much as 50% for text-based applications (i.e. Word) and 40% less for graphics applications than the uncompressed data streams.
ICA and RDP compression is enabled by default, and there’s nothing else you need to do about it. There is sort of an “urban legend” about compression, where people sometimes recommend enabling it. In reality it’s always enabled (unless you specifically disable it in an ICA file or RDP file), so people who tell you to enable it don’t know what they’re talking about.
Some have argued that enabling compression requires extra CPU resources on the client and server to process it, but this is absolutely miniscule in today’s world.
The ICA and RDP protocols also have the ability to cache portions of screen graphics (called “glyphs”) on the client device. Then, if the user navigates to a screen on the server that has portions of it stored in the client’s cache, the server can send down an instruction to the client requesting that it pulls portions of the new graphic updates from its local cache rather than the server sending down the same image again.
A client device must have some sort of local storage (hard drive, flash memory, etc.) to make use of the bitmap cache, but other than that there’s no real downside to using it. It can be enabled via the client GUI or via an RDP or ICA file.
Queue keyboard data and mouse strokes
This is another option that’s enabled by default for both RDP and ICA sessions. With ICA, for example, keyboard typing data is queued on the client device and only sent to the server every 100ms (ten times per second, which is probably faster than people would ever notice). Similarly, mouse movements are only sent to the server every 50ms.
It’s important to note that this does not mean that that an update goes from the client to the server every 100 or 50ms. It just means that whenever the user happens to be typing, updates only go out every 100ms.
In environments where bandwidth is a concern, you can increase the client queuing timers so that fewer packets are sent from the client to the server for a given amount of user input. You’ll have to experiment with the various settings to see what’s acceptable in your environment.
Disable unnecessary virtual channels
As you probably know, the ICA and RDP protocols are made of up several virtual channels. (Printing, port mapping, clipboard integration, audio, etc.) One of the “popular” recommendations floating around on the Internet is that you can save bandwidth by disabling unnecessary virtual channels. While this recommendation is rooted in truth, in reality a lot of people are sad to find that disabling virtual channels doesn’t magically save the day.
To know why, you have to know how the ICA or RDP virtual channel infrastructure works. Except for the initial connection (where client and server capabilities and virtual channels are negotiated), disabling virtual channels alone does not save any bandwidth.
However, having unneeded virtual channels enabled does increase the chance a user could to something via one of them that consumes bandwidth. For example, leaving the clipboard synchronization virtual channel enabled means that a user cutting and pasting on their local workstation would inadvertently send that clipboard data to the server via ICA or RDP.
Therefore it’s probably still a good idea to disable virtual channels you’re not using as long as you understand the actual benefit you’ll get from this. (Virtual channels are enabled or disabled on a session by session basis, so you can completely customize which users get what.)
Capping Virtual Channels
If you’re using MetaFrame, instead of harshly enabling or disabling entire virtual channels, you can take the more eloquent approach of placing specific bandwidth limits on each individual virtual channel. (This cannot be done with RDP.)
Placing bandwidth limits on specific channels doesn’t technically save any bandwidth. What it does do is limit the ability for one channel to consume too much bandwidth that another channel might need.
Capping the bandwidth of a virtual channel will limit the amount of bandwidth the ICA protocol uses, but at the expense of causing individual channel operations to take more time. For example, if there is 100k that needs to be synchronized to the clipboard via ICA, an unrestricted ICA client might spike to 50kbps for two seconds to perform the sync. However, if you put a 10kbps limit on the clipboard virtual channel, then that same synchronization would consume 10k for ten seconds.
All other things being equal, this virtual channel cap doesn’t change the average bandwidth consumption at all. (100k over ten seconds, whether it’s 50k for two seconds and 0k for eight seconds or 10k for ten seconds is still an average of 10k per second.) What the cap does do is “flatten” the curve, and in theory let the ICA client perform better when individual spikes occur.
If you want to experiment with ICA virtual channel caps, you can do so with the SMC Console utility we referenced previously.
It’s worth pointing out that internally, both the ICA and RDP protocols have priories assigned to each virtual channel—screen and keyboard data is the highest, printing is the lowest, audio is medium, etc. If you’re using ICA then you can tune these priorities on a server-wide basis. RDP’s priorities are hard-coded and cannot be changed.
Changing other Session Parameters to Minimize Session Bandwidth
The final thing worth mentioning in this section is that you should keep in mind that you can always change the properties of the session itself to minimize the amount of raw data that will need to flow from the server to the client. You can lower the resolution, decrease the color depth, or lower the audio quality to downsize the data hitting the ICA or RDP client.
Change the Way the Network Deals with Load
All of the settings we’ve looked at so far affect the ICA or RDP protocol stream itself. However, you can also look at this from the network perspective and adjust the way the network actually deals with ICA or RDP data. This is primarily done with hardware devices that use the following technologies:
- Packet Shaping
- Network Compression
- Network Caching
We’ll discuss all of these technologies separately, although most devices on the market today are based on some or all of these technologies.
The packet shaping market used to be dominated by Packeteer, Allot, and Sitara, but now there are dozens of vendors in the space. Even though they won’t admit it, they all do basically the same thing. They’re hardware devices that typically sit on the network near the edge router and intelligently prioritize some traffic while limiting other traffic—all based on business rules. For example, you can configure one of these things so that ICA always has priority over everything else, or so that HTTP and MP3 download traffic is limited to 50% of your outbound connection.
The cool thing about these packet shapers is that as we learned earlier, the faster a sending computer receives the “ack” packets from the receiving computer, the faster it sends the next group of packets. Because of this, these packet shapers are just as effective as slowing down “remote” traffic as they are at slowing down local traffic since they can simply intercept the local outbound “ack” packets and delay them, therefore causing the remote computer to slow down its transmission.
In addition to packet shaping, some hardware network devices have the ability to compress all of the data that travels through them. (This works will all types of data, including encrypted data, since these devices just do a low-level compression.) These devices work in pairs, typically with one on each end of a WAN link. As the data stream passes through, the first device compresses the data where it’s passed on to the edge router. The second device on the remote end receives the data and decompresses it and sends it in its original format to the ultimate destination computer.
These devices are completely transparent to applications, and they work well with ICA and RDP traffic. The amount of compression they can offer varies, but they’re definitely worth looking into if you have bandwidth issues and they’re usually cheaper than buying more bandwidth.
Finally, some of these hardware devices also offer intelligent caching capabilities. Again working at a low protocol level, these pairs of devices sit on each side of a WAN link and analyze all traffic moving through. The receiving unit has the ability to cache packet payloads, and both units keep track of all data that passes through.
Then, if the sending end notices data that it has already sent to the receiving end, the sending unit transmits a small instruction tag that indicates the cached location of the original data. The sending unit rebuilds the packets from its cache and sends the newly-reconstructed original packets on to the destination. This entire process is also transparent to the applications.
Whether network caching devices are effective in your environment really depends on your applications. If most of your remote users are using the same application and that application is built off of several screens, caching can be very effective, sometimes cutting traffic by 40%. On the other hand, if your users are using several random applications then the caching devices might cost you more money than what you can save in bandwidth.
The one caveat to using these types of caching devices is that they cannot be used if you’re encrypting the ICA or RDP data with SSL or SecureICA. The reason for this is obvious, in that no two encrypted packets would ever be the same even if they contained the same data.
As we mentioned previous, latency is probably the biggest headache in server-based computing environments. Having high latency can very quickly kill the usability of a system.
So how much latency is too much? That really depends on the type of application you’re using and how smart / patient your users are. Personally I can use a session with 500ms of latency no problem. My grandmother might need something under 200ms though.
There’s an interesting theory about latency and usability that was recently posited by Tim Mangan, a seasoned industry veteran and author of some fantastic tools. In a nutshell, Tim suggests that the actually amount of latency you have doesn’t really matter. What matters is that that latency is consistent. A session that is continuously at 400ms will be much easier to use than one that’s usually at 100ms but that spikes to 300ms every few seconds. (View the full paper at Tim’s website, www.tmurgent.com.)
That being said, there are two things that you can do to minimize the effect that latency has in your environment.
- Free up bandwidth
- Citrix SpeenScreen
Free up Bandwidth
Remember that in many cases, high latency is caused by congestion on the network. Packets get queued up at the sending computer as they wait to be funneled through the limited bandwidth. Therefore, if you have issues with latency in your environment, the first thing you should do is follow the steps in the previous section to limit the amount of bandwidth that each session takes.
It’s worth pointing out here that capping specific virtual channels can be particularly effective at smoothing out the spikes that cause a backlog of packets and long latency times.
Citrix SpeedScreen Latency Reduction (SLR)
If you’re using Citrix MetaFrame, there is another great tool you can use to combat latency: Citrix’s SpeedScreen Latency Reduction (SLR).
SLR is technology that’s built into MetaFrame Presentation Servers and ICA clients that allows users to experience smooth typing in environments with high latency. Using SLR properly can, for example, allow a user to have a smooth typing experience while writing a Microsoft Word document—even if there is 200, 500, or (gasp!) 1000ms of latency between the server and their ICA client.
Citrix’s SpeedScreen Latency Reduction does two things. Firstly, (and most importantly), it provides something called “local text echo.” Local text echo allows characters to appear on the ICA client device’s screen the instant a user pushes the key on their keyboard.
For example, imagine a situation without SLR where a user is typing a document via an ICA session with 400ms of latency. When the user presses a key, the ICA client sends that key press code to the server. The server receives it, processes it, puts the character into the screen buffer, and sends the new screen image to the client device. However, due to the 400ms latency, the actual character doesn’t show up on the user’s screen until about a half-second after they first pushed the button. To the user, it would appear that there is a half-second “lag” when typing.
To address this, SLR’s Local Text Echo causes the ICA client to behave a bit differently. When enabled, a user pressing a key on their keyboard causes that key code to be sent to the server. However, at the same instant the local ICA client software also paints the appropriate character on the user’s screen even though the actual screen painting instructions from the server are bogged down in the 400ms latency between the client and server. Then, once the ICA client finally receives the actual updated screen from the server, it doesn’t have to update that character on the local screen since it already put it there back when the user pressed the key.
In a sense, SLR’s Local Text Echo is kind of like a “pre-caching” of the text. Local Text Echo is totally awesome and works really well. It works with all different fonts and font sizes.
The other major SLR feature is something called “Mouse Click Feedback.” This addresses another common problem in Citrix environments with high latency, namely, the fact that users click on a button, become impatient, and click on a button again before the first click registered. Then, when the application’s response finally makes its way to the user, whatever the user clicked on comes up twice.
Mouse Click Feedback works by adding the hourglass symbol to the arrow cursor the instant the user clicks the mouse anywhere within the boundaries of the remote ICA application. It does not technically prevent the user from clicking the button twice, but the general idea is that the user will see the hourglass and then have the patience to not click the button again.
In most environments with latency, people use both the Local Text Echo and Mouse Click Feedback components of Citrix’s SpeedScreen Latency Reduction.
It’s important to note that enabling SLR will increase the amount of bandwidth a session needs, sometimes by as much as 20%. This is due to the fact that extra data will have to move back and forth to coordinate the text rendering. However, SLR will let the user experience a smoother and more responsive session.
This creates an interesting challenge for you as an architect. In order to lower latency you need free up bandwidth by decreasing the ICA data transfers. In order to overcome high latency, you need to enable SLR which increases bandwidth consumption. Which works best for you? That’s why you have a test lab. (Click here for more information about the precise configuration of SLR.)
If you lose enough packets, your session will eventually disconnect. From a percentage standpoint, both the ICA and RDP protocols can handle even a 10% packet loss rate, and anything higher than that means that you have a major networking issue to deal with.
Refer to the section at the end of this article about troubleshooting dropped and disconnected sessions for more information.
Out of Sequence Packets
Like lost packets, ICA and RDP can easily deal with the “standard” amount of packet sequence issues. Any more significant issues will affect the flow of ack and syn packets and will be a major problem, certainly much bigger than any Citrix or Terminal Server performance issue.
There’s also not too much you can do about jitter other than minimizing your ICA traffic size, forcing SpeedScreen to be used, and crossing your fingers.
If you’ve determined that the MTU size of some component of your network is smaller than 1500, you can experiment with a smaller MTU size on your Citrix or Terminal Server. I use the word “experiment” here because you would have to work with a network sniffer to see how the changes you make on the server affect the actual number of packets that go across the network.
Windows is smart enough to automatically configure the MTU size for each network adapter based on what that adapter requests. However, you can override the default settings by forcing an adapter to use a smaller size. This is done in the registry on an adapter-by-adapter basis.
Key: HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\adapter ID
Data: The MTU size in bytes. If you want to enter it as a decimal number by sure to select the “Decimal” button
The only drawback to manually adjusting MTU size is that it applies to all TCP/IP communication for a given network adapter. If you have users connecting wirelessly with an MTU size of 460 and regular users with an MTU size of 1500 then whatever you configure will affect them both.
Is there a happy medium where packets can be evenly split for wireless users but not too small to be inefficient for regular users? That’s something that you’ll need to figure out with your test environment and a network sniffer.
(By the way, the “wireless” users we refer to here with small MTU sizes are mobile wireless users connecting via mobile phone networks. 802.11 wireless connections use the standard MTU size of 1500 bytes.)
Troubleshooting Random Disconnects
The last thing we need to look at in this article is something that we’ve pretty much avoided so far: dealing with random session disconnects and dropped sessions. We’ve spent a lot of time discussing the various performance issues of sessions when they are actually connected, but what happens when a user is happily working and then suddenly their session is disconnected?
Rather than going through all the step-by-step methodology of how to troubleshoot (check the cable, try to ping the server, etc, etc.), we’ll focus here on what happens that causes the ICA or RDP protocol to disconnect a session.
A disconnect is actually a complex procedure. Several things need to happen for a session to become disconnected. (It’s a lot more involved than just yanking the network cable out of the wall.)
- The client needs to send packets to the server that go unanswered. After a period of time the client has to stop trying and give the user an error message indicating that communication with the server has ended.
- The server needs to realize that the client device is no longer sending packets, and it needs reclassify the user’s session as “disconnected.” (Or it has to reset the session if disconnected sessions are not allowed.)
Let’s take a more detailed look at each of these, beginning with the client.
What happens when a session is dropped—a client’s perspective
Imagine an ICA or RDP client is being used normally. The client software converts the keystrokes and mouse movements into ICA or RDP data packets and then hands them off to the TCP/IP interface. The TCP/IP and networking subsystems handle the complexities of adding sequence numbers, sending the data, and receiving ack packets from the server. If for some reason the client doesn’t receive the ack packet after the retry timer is up, you’ll remember it doubles the time and tries again, then doubles it again and tries again, etc. All of these retries handled by the low-level TCP/IP subsystem—they’re hidden from the application itself. However, once the max retry count is reached (remember this defaults to five in Windows), the TCP/IP subsystem reports back to the application that the packet could not be sent.
What happens next? That depends on the application. Microsoft’s remote desktop connection client, for example, causes the remote RDP window to fade to grayscale. Then, a little box pops up on the screen indicating that the system is now trying to reconnect with attempt 1 of 20. The client software sends another packet down to the TCP/IP subsystem. The TCP/IP subsystem will transmit that packet, wait for the ack, double the timer, retransmit, wait, double the timer, etc. up until the five retry counts have been hit. It will then report back to the application that the packet transmission failed, and the RDP client will update the window to say “attempt 2 or 20” and the whole process is repeated.
If the remote server finally does respond (by sending data or an ack) then the TCP/IP subsystem notifies the application, and the RDP client springs back to life in full color.
What happens when a session is dropped—a server’s perspective
The server also needs to keep track of which sessions are active. After all, if one user disappears then the server will need to change the status of that user’s session from “active” to “disconnected.”
Remember that we discussed that the ICA and RDP protocols are very bursty. This means that they are almost dormant in-between bursts. This can cause a problem if a network connection fails during one of these quiet periods since the server would not even know the client has become disconnected since it wasn’t expecting any client input. (Of course meanwhile the client may be stepping through all of is reconnection steps but the server wouldn’t know this since the connection has been lost.)
Why does it matter if the server knows whether or not a client is disconnected? In MetaFrame XP, ICA clients cannot connect to active sessions; they can only connect to disconnected ones. The problem this causes is that if a client drops the connection and then quickly tries to connect again, the server won’t realize that the first connection was dropped, leaving the session in an active state. Upon reconnection, the server will not be able to assign the active session to the user, so the user will have to create a new session from scratch. By the time the server realizes the first connection is lost it’s too late—the user already has a second session.
(Citrix has addressed this from the client side with the “auto client reconnect” functionality of newer ICA clients. Auto client reconnect first checks to see if the session that was dropped is still active, and if so, it disconnects it before connecting back to it.)
To address this problem from the server side, a feature called “ICA Keep-Alives” is used. ICA Keep-Alives are enabled by default in all new MetaFrame environments.
ICA Keep-Alives use a timer to track how long it’s been since they last had contact with an ICA client. Once the timer is up, the server sends a Keep Alive packet (kind of like an “ICA ping”) to make sure the client is still there and running. If not then it switches the status of the session to “disconnected.”
ICA Keep-Alives are another one of those things that you have to balance. For example, if you set your Keep-Alive interval to be five seconds then your environment will be very responsive and dropped sessions will be identified quickly. However, a five-second keep-alive interval also means that a lot of unnecessary traffic will be flowing across your network as your server checks in with each session every five seconds.
There’s one other interesting aspect of the ICA keep-alives timer we should cover. To see where we’re going here, think back to how an application interfaces with the low-level TCP/IP stack and the role of the TCP retry count.
Let’s assume that you set your ICA keep-alive interval for 30 seconds. After 30 seconds the server constructs an ICA keep-alive packet and hands it off to the TCP/IP interface to be sent to the client device. If the client device really has been lost, then of course it won’t send back an ack packet. After a certain amount of time (remember this is dynamic based on the current session parameters) the server will try again. Let’s say the default timer is two seconds, meaning now 32 seconds have gone by since the client last checked in. If your TCP Max Retry count is set to 5 (the default), then your server will also send out that ICA packet at 36, 44, and 60 seconds. Only after the 5th failure at 60 seconds will the TCP/IP subsystem report back to the application that the remote computer is not there.
Of course this is just an example, but the dynamic retry intervals could mean that the whole ICA keep alive interval could be two minutes or more on connections that initially have a lot of latency.
How does Session Reliability factor in to this?
One of the new features of Citrix MetaFrame Presentation Server 3.0 Advanced or Enterprise editions is something Citrix calls “session reliability.” On the surface, Session Reliability hides the fact that client packets are going unanswered by the server (apart from a small spinning hourglass cursor). The idea here is that by hiding this from the user, the user will have a better experience during small network “hiccups.”
Session Reliability is enabled by default in MPS 3 environments when Win32 clients connect with version 8 or higher. When enabled, it wraps the standard ICA port 1494 communication in a new CGP-based protocol that uses port 2598 and sends it off to the server. The Citrix XTE service running on the server receives this traffic, peels off the reliability layer, and passes it internally to port 1494.
From a dropped session troubleshooting standpoint, the important thing to know about Session Reliability is that it does not use ICA keep-alives since the CGP protocol wrapper contains its own mechanisms for this.
In the real world, your servers will probably have a mixture of standard connections and Session Reliability connections (especially since Session Reliability only works with certain clients and never works through Citrix Secure Gateway). This means that you’ll still have to work with ICA keep-alives even in newer environments.