This is part one of an article that's turning out to be a lot longer than I expected. I wanted to put it out here to get your feedback so far. Today's section discusses the various networking characteristics and how they affect Citrix and Terminal Server, and how you can figure out if they're affecting your environment. In part two of this article we'll look at what you can do to tune your servers to deal with these issues.
I have written extensively in the past about how you can tune the performance of your Citrix MetaFrame or Microsoft Terminal Servers. However, I think I’ve only been focusing on part of the performance challenge.
The part that I’ve been neglecting is the network. As far as overall performance of your environment is concerned, it doesn’t matter how well you tune the processor, memory, and application performance of your servers if you have a bad network. A poorly performing network can kill Terminal Servers that work fine locally.
Unfortunately, every network is “bad” at some point, and most of the people reading this article probably don’t have the ability to do much about that. Therefore, we’ll focus on how certain characteristics of the network can affect your server-based computing environment and what you can do about it.
How can network characteristics affect server-based computing?
I think it’s safe to say that when thinking about network performance, most people focus on two network characteristics: bandwidth and latency.
In order to really dig into it though, we also need to look at a few more properties of a network: packet loss, out-of-order sequencing, jitter, and MTU size.
Let’s step through each of these six characteristics and look at how they can affect a Citrix MetaFrame or Microsoft Terminal Server environment. Then we’ll look at how you can test for these and what you can do about them.
Bandwidth is the most commonly-referred to networking characteristic. It simply defines the amount of data that can be transferred from one computer to another in a certain amount of time. This leads to the classic question:
So how much bandwidth does Citrix take?
The classic answer: That depends. How much do you have?
While this is somewhat of a joke, it’s actually rooted in truth. Most applications will take as much bandwidth as they possibly can. In general, more bandwidth is always better, although there are limits to what’s practical.
Another variation of the classic question is I have xxx bandwidth. How many Citrix users will that support?
How does this affect Citrix?
In order to answer these questions and understand how the amount of available bandwidth will affect Citrix MetaFrame and Terminal Server environments, it’s important to understand how the ICA and RDP protocols work.
Fundamentally, Citrix’s ICA and Microsoft’s RDP protocols are simple data streams that are sent between the client and the server. If the amount of data moving through the stream is less than the amount of bandwidth that’s available, then everything will be fine and you’ll have no problem. However, if your ICA or RDP protocol tries to use more bandwidth than is available, interesting things will start to happen.
But how much bandwidth does ICA or RDP use? There are statements floating around on the Internet that say things like “Citrix ICA uses 15kbps” (or 20k or 30k or whatever). There is one thing that’s critical to understand about these statements: Any statement about ICA or RDP bandwidth usage simply refers to an average over some period of time. The longer the time period, the more accurate the average.
In real life, ICA and RDP are extremely “bursty.” If you don’t touch the mouse and nothing’s happening on the screen, no data will cross the network (except for the occasional keep alive packet or similar). However, if you launch a new application that completely updates the screen, you might see 70k of data move from the server to the client in 1/3 of a second.
If you open performance monitor and view the amount of bandwidth a session takes (discussed later), you’ll see many sharp spikes. These “15kbps average” numbers simply come from someone running a session for maybe an hour and then looking at the results. “Hmmm.. 6.7MB was transferred over the last hour, so that means this session required 15kbps per second.” While it’s true that 15kbps was the “average,” the session probably actually used maybe 100kbps for 30 seconds (distributed over the hour) and 2kbps for the remaining 59 minutes and 30 seconds.
As a side note, there is a rumor floating around on the Internet that the RDP protocol is a “stream” protocol (like RealAudio, etc.) and that it consumes bandwidth regardless of whether the user is actually doing something. That rumor is absolutely 100% false.
So how much bandwidth does ICA or RDP actually require? It all depends. Of course there are things you can do (discussed later) to “minimize” the bandwidth consumption of RDP or ICA, but that minimization really does nothing more than limit the spikes in data transfer.
Where you’ll start to see issues is when ICA or RDP wants to “spike” more than the available bandwidth (either because other data is on the network or because the network limit is hit). When this happens, the ICA or RDP data will be queued on the sending machine until it can be sent. This has the effect of increasing latency (as discussed in the next section). Of course if too much data is queued up then it’s possible that the sending computers network output buffers will become full, and then data packets will be discarded (as discussed in the packet loss section later in this article).
Latency is the amount of time, measured in milliseconds (1000ms = 1 second), that it takes for data to cross a network between two computers. On computers that are connected to the same switch, latency might only be 2 or 3ms. Many connections over the Internet will have latency in the 200 to 300ms range. Connections bounced over satellite communications might have 700 to 1000ms latency.
How does this affect Citrix?
In all cases lower latency is better. In Citrix and Terminal Server environments, having low latency is probably more important than having adequate bandwidth. This is due to the fact that all interactive user activities happen remotely when using ICA or RDP.
Imagine a server-based computing connection with 200ms one-way latency. As a user, you would type the letter “A” on your client device. It would take 200ms for the packet containing the keycode for the letter “A” to arrive at the server. It would take a few milliseconds for the server to process the keystroke and put the “A” on the virtual framebuffer, and then it would take another 200ms for the updated screenshot to get back to your client device. Therefore, the total roundtrip delay would be around 400ms, or almost half a second. This would have the effect of having a half-second “delay” in your typing.
Environments will very low latency will be taken for granted by your users, but as soon as that latency starts to creep up you’ll start hearing about it.
Packet loss is the term that defines what happens when packets from the sending computer don’t show up at the receiving computer. In server-based computing environments this can be from client to server or server to client. Packets usually get lost when there is a problem with the network or when the network is too busy. To understand how this applies in server-based computing environments, we need to take a step back and look at how TCP/IP packet-based communication takes place in general.
Fundamentally, two nodes on a network communicate by sending a stream of data back and forth. However, since TCP/IP networks are “packet switched,” this stream of data is broken up into a bunch of little pieces, and each of these little pieces arrives at the destination on its own where the destination computer reassembles them to form the original data stream. The exact size of these packets varies, but in most standard Ethernet environments each packet is 1.5kB.
There’s another interesting aspect of TCP/IP communications that’s relevant to ICA or RDP communication. Fundamentally, network communication is viewed as unreliable. There is no guarantee that a packet sent by a computer will actually get to its destination. (Many things could cause this: Collisions, equipment failure along the way, congestion, etc.)
As you can imagine, in an ICA or RDP environment, a lost packet would cause a “hole” in the stream of data. What if that “hole” contained keystrokes or characters to be put on the screen? Clearly, losing packets is a bad thing.
If a packet got lost, the sending computer would need to know about it so that it could resend the missing data. In the TCP/IP world, this is done via a special packet type called an “ack” (or “acknowledgement”). It works like this: The sending computer sends a bunch of packets to the receiving computer. Each packet contains a sequential serial number (called a “sequence”). Every so often the receiving computer sends an “ack” packet back to the sending computer with the number of the most recent sequence that’s been received in full. Therefore, if any of the sending computer’s packets got lost along the way, the receiving computer would never send out the ack, so the sending computer knows that it needs to resend the packet. See Microsoft knowledgebase article 169292 for all the gory details of acks, receiving window buffers, and other TCP/IP communication internals.
(Of course an ironic side effect of this is that sometimes the ack packet will get lost in transit, and the sending computer will resend the packet even though the original packets actually arrived.)
So how is this relevant to ICA and RPD? Stay with me…
As we said, if the sending computer never receives the ack packet from the destination computer, the sending computer resends the original packet. However, it only does this a fixed number of times, and if it fails after that the TCP/IP protocol driver gives up and reports back to the application that there was some kind of networking communications error. In Windows, the default number of retries before reporting failure is five, although you can change this in the registry. This is called “TCP Max Retry Count.”
Of course computers are very fast, and a sending computer could burn through all five retries in the blink of an eye. To address this, the TCP/IP standards dictate that the sending computer should double the amount of time it waits for an ack response before retransmitting the original packet. For example, if the sending computer waited 2 seconds for the ack before retransmitting, the second attempt of sending the packet would occur after 2 seconds, the third attempt will be 4 seconds after the second, the fourth will be 8 seconds after the third, etc. (Up until the TCP Max Retry Count is hit.)
One of the cool things about TCP/IP is that the exact duration of the retry timer is automatically managed by the operating system and constantly changes for each destination computer. It starts at 3 seconds, but it goes up if the ack packets are taking a long time and goes down if they’re coming back quickly. In doing this, each connection is automatically tuned for the actual network performance.
How does this affect Citrix?
Since lost packets must be retransmitted, having a high number of packets lost is effectively like adding a lot of latency. Of course if too many are dropped, then that means that the retransmits are getting dropped too, so eventually your ICA or RDP connection will disconnect.
This issue is usually only seen on unreliable wide-area networks, such as mobile telecom-based wireless networks and remote offices connected via the Internet.
Out of Sequence Packets
Because the data stream is broken up into many pieces which each get to the end location on their own, there’s a chance that individual pieces of data might arrive in a different order than they were sent out. This is called “out of sequence” packets because the sequence numbers of the packets will be in the wrong order.
However, the fact that each packet contains a sequence number means that the receiving computer can reassemble the pieces back into the proper order for the receiving application.
How does this affect Citrix?
The good news about packets that are received out of sequence is that this is something that’s hidden from the application (the ICA or RDP protocol interfaces), so there’s nothing you have to worry about here with regard to server-based computing environments.
Of course in the real world, these four aspects of a network usually change over the course of a connection based on current conditions. “Jitter” is simply the term that describes the tendency of a network to have rapidly varying amounts of bandwidth or latency or packet loss. One second it’s fine. The next second you suddenly have 600ms latency. Two seconds later you’re back to 10ms latency. Then you suddenly drop 10% of the packets. Etc.
How does this affect Citrix?
Some of the aspects of the ICA and TCP tune themselves dynamically, so jitter doesn’t affect them. However, other aspects (experience, SpeedScreen, etc.) are set once based on the condition of the connection when it’s first made. If the performance of the network then deteriorates, your users will have server-based computing sessions that are “tuned” for the wrong network conditions.
MTU stands for “Maximum Transmittable Unit.” In simple terms, it’s the size of the biggest packet that can travel across a network. As we mentioned earlier, most Ethernet networks have an MTU size of 1500 bytes (1.5k). However, this can vary by situation. Specifically, most of the mobile telecom-based wireless networks (1xRTT, EVDO, UMTS, GPRS, etc.) have MTU sizes that are much smaller, perhaps in the 400-600 byte range.
How does this affect Citrix?
Let’s think about what happens when you have ICA or RDP clients on the other end of a mobile wireless network that has a small MTU size. The “M” in “MTU” stands for “maximum,” but there is no minimum size limit. Therefore, if you have a user connecting from a wireless device where the network has an MTU size of 460 bytes, then all the packets that the Citrix or Terminal Server receives will be 460 bytes. So far, so good. No problems yet. (Actually, the initial connection requests from the client don’t contain much data, so many of those packets might be even smaller than 460 bytes. A TCP/IP packet is only as big as it needs to be. It is never “padded” to fill the MTU size.)
When the Citrix or Terminal Server starts sending it’s ICA or RDP data stream to the client device, the data stream will be broken up into packets that are 1500 bytes. (1500 is the default setting for Windows.)
However, at some point that 1500-byte packet will hit the mobile wireless network with the MTU size of 460. That network’s edge router will have to break apart that 1500 byte packet into smaller chunks. It will create three 460-byte packets and one 120-byte packet. (I’m simplifying the math here by not counting the TCP/IP stack overheard.)
So how does this affect ICA or RDP performance? In many mobile networks, there is a lot of overhead associated with transmitting packets, regardless of the size of the actual packet itself. In our example we have each packet from the server being broken into four packets for the mobile network. However, one out of every four mobile packets only contains 120 bytes of data, which means all of the overhead needed to transmit a packet is “wasted” on a mere 120 bytes.
Imagine that three 1500-byte packets come from the server. They would be broken into nine 460-byte packets and three 120-byte packets, or twelve packets total. But wouldn’t it be better to combine the three 120-byte packets into a single 360-byte packet? That would mean that you would only need to transmit ten packets across the wireless network instead of twelve, a savings of over 8%.
The bottom line is that MTU size changes along the route between your server and ICA or RDP clients can add inefficiencies into your server-based computing environment.
Figuring out which of these issues are affecting you
All networks exhibit some of these six characteristics. That’s just a fact of life. The challenge for you is that you have to figure out which (if any) of the six are having a negative impact in your environment and whether it’s even possible to make a change that will improve the performance of the session. To understand what we’re talking about, let’s start with bandwidth limitations.
It’s pretty easy to figure out how much bandwidth an ICA or RDP session is taking. What’s difficult is figuring out whether reducing the bandwidth used will actually translate into better performance for the users. (Of course some could argue that lowering the bandwidth consumption will save on networking costs in the long run, so it’s a good thing either way.)
Figuring out whether bandwidth is an issue for you is a multi-step process. The first step is to get some understanding of how much bandwidth a particular session is taking. A quick down-and-dirty way to do this is to use this great $20 tool called DU Meter (for “Download / Upload Meter). It’s this little app that runs in a window in the corner of your desktop. It’s made for people who need to track their bandwidth usage for their broadband provide, but I like it because it’s cheap, small, simple, and doesn’t require a reboot to install or remove.
DU Meter shows live statistics for download and uploads. So, fire it up and launch an ICA or RDP session. You’ll see the little display in the corner constantly updating the amount of outgoing and incoming traffic in KB. The best part about it is that it has a little stopwatch function that you can start that will show you the total, maximum, and average data transfer (in KB) in both directions from your client device. I like to drop DU Meter on a desktop and have a user use their session for awhile to get a feel for what I’m dealing with. (Just keep in mind that DU Meter captures all network traffic, so if Windows decided to automatically download a Hotfix in the background while you’re running your test then you’ll get some weird results.)
You can also use the Performance Monitor MMC snap-in on the Terminal Server to view the total data transferred for a particular session. (Terminal Services Session | Total Bytes | Select the ICA or RDP session you’re interested in.)
The challenge her is that once you have this data, what do you do with it? (More on this later.)
The quickest way to check the latency of an environment is to ping the server from the client. However, a more accurate way is to use the Performance Monitor. If you’re using Citrix MetaFrame, you’ll find a performance object called “ICA Session” with several counters for latency, including the current latency, average latency, and deviation (the difference between the maximum and minimum recorded latency). There is one instance of the ICA Session object for each ICA session, so you need to pick the right session from the list if multiple users are accessing the system and you only want to investigate one.
Figuring out whether you are experiencing a significant amount of lost packages is a bit more involved than checking some of the other things, but it’s still possible to do and there are a few different ways to do it.
A lot of people will simply execute the ping command with the “-t” option that causes it to run forever and then sit back and look for timed out pings. The problem with this is that real world packet loss rates vary depending on network load, so to get a real test you should ping the server from the client while a user has an ICA or RDP session open. Of course then your ping command would introduce its own load into the environment thereby contaminating the test, so that’s really not the best way to do things.
Another option is to use a packet sniffer (my favorite is Ethereal because it’s awesome and free) to look for rebroadcast packets, but that can be a fairly involved process.
One of the easiest ways to check for packet loss is to come back to Performance Monitor and track a counter called “Segments Retransmitted / Sec” under the TCP object. Since this is a “per second” rate you’re not going to get any aggregates, but any significant jump over zero should tell you that something’s not right. (ICA and RDP can easily deal with as high as a 10% packet loss without any real performance problems, so most likely this is not your problem.)
Out of Sequence Packets
It’s not really practical to look for out-of-sequence packets per se, but if this happens a lot you’ll see it reflected in other areas, such as higher latency.
If you’re using MetaFrame, it’s easiest to look for wild changes in current latency in a session or high latency deviations (both of which are covered in the ICA Session performance object).
MTU Size Limits
As was mentioned previously, most likely the MTU size of all the networks between your clients and server is 1500. You can easily test this with the ping command. Use the “-l” (that’s the letter “L,” not the number “1”) option to specify the length of data to send in the ping packet. (The default is 32 bytes.) For example, you could try to send a packet that is 1500 bytes to see if it makes it. You’ll also need to use the”-f” option to mark the ping packet as “not fragmentable”—otherwise it could get broken into pieces and ruin your test.
For example, try the following command: ping –l 1472 –f yourservername
You’ll see that over the Internet (and in many routed corporate sites) the maximum length you can specify with the ping command is 1472. That’s because the IP and ICMP headers add 28 bytes to the packet, bringing the size up to an even 1500. If you get a response that says “Packet needs to be fragmented but DF set” then you know that the length you specified is bigger than the MTU size somewhere along the way. In this case, play a little “guess and check” until you find a length that results in a successful ping. Then, add 28 to that number and you have the MTU size for that route.
Continue on the Part 2 for the rest of this article. We'll discuss how you can build a test lab to simulate these issues and how you can tune your servers to deal with them.