Understanding the new logon throttling load evaluator options in Citrix Presentation Server 4.5

Load balancing in a Citrix Presentation Server environment has not fundamentally changed since the days of MetaFrame XP. Today's PS 4.

Load balancing in a Citrix Presentation Server environment has not fundamentally changed since the days of MetaFrame XP. Today's PS 4.5 environments still have zone data collectors, load evaluators, and load indexes.

As a very, very quick refresher, remember that the load evaluator is a thread in the IMA Service on a Presentation Server that calculates the load index for that server. The load index is an integer value from 0 to 10,000 that objectively represents how busy a server is. A value of 0 equates to no load, while a value of 10,000 represents a full load and will prohibit new user connections from being load-balanced to that server.

Each server continuously sends it’s current load index to its zone data collector, meaning that the data collector knows the relative load for multiple servers. Then when incoming connection requests come in, a quick check with the zone data collector will reveal which server is most appropriate for that user to connect to.

This load balancing architecture generally works fine but has one fatal flaw: new incoming requests are always routed to the least-busy server. While this doesn’t seem like a problem at first, consider an environment that has maybe 20 servers each with 80 users. If a new server is brought online (or if a server unexpectedly reboots), that new server will have 0 sessions while the other servers all have 80 sessions. So guess which server the next 80 users will be routed to?

Again, this is not a problem in theory, but in reality, the logon process is “expensive” in terms of CPU and processing power, and in many cases even the fastest servers can only actually process a handful of logons at the same time.

So if this 20-server farm gets 20 new logons at more-or-less the same time, they will all be routed to that new server. Since it’s unlikely that the new server can actually handle 20 logons at once, some of these logons will fail. The users will try to re-logon, and they will be again routed to that new server (which at this point has maybe 10 sessions compared to the 80 on the other servers).

What this means is that a new server coming online during a busy period can essentially prevent all new logons across your entire farm! This is known colloquially as “the black hole effect.”

Citrix of course is aware of this problem. When the current generation of load balancing technology was released in MetaFrame XP, Citrix attempted to mitigate this with something called the “load bias.” The load bias was an temporary and artificial increase of a server’s load index that happened whenever a server received an new incoming connection.

The logic is this: If we go back to our 20 server with 80 users each example, those twenty servers might be humming along just fine each with a load index of around 8000. (Remember the load index is an integer value from 0 to 10,000 that indicates how relatively busy a server is.) When the new server comes online, it will have a load index of 0. When a new user connects to that new server, the load evaluator on that new server will calculate its new load index and submit that to the zone data collector. The problem is that the whole process of establishing a session and updating the data collector takes time—a minute? Maybe two? Meanwhile, the data collector keeps on sending new user requests to that server (since it sees a load index of 0 versus 8000 on the other 20 servers), and that new server is quickly overrun creating the black hole effect.

To address this, the zone data collector will temporarily increase a server’s load (via that “load bias”) whenever a new connection request comes in. This temporary increase will be voided whenever the user establishes their session and the server updates the data collector with its real load index.

The Load Bias in MetaFrame XP

In MetaFrame XP, the load bias was 200 points.

This 200-point temporary increase worked great in situations where you had a bunch of servers that were all more-or-less loaded equally, since it prevented a burst of new users from all going to the same server. But in our new-server-versus-20-full-servers it’s worthless, since the 200-point load bias wouldn’t really have an effect when one server had a load index of 0 versus all the other servers with load indexes of 8,000.

Fortunately even in the MetaFrame XP days you could change this load bias via the registry (HKLM\SOFTWARE\Citrix\IMA\LMS\LoadBias). “Great!,” people thought, “I can just set the load bias to like 1000 and be all set!” Unfortunately if your load evaluators contained the “user count” rule, that registry key was ignored and Citrix used a different formula to calculate the load bias!!?! (This is crazy because pretty much everyone uses that rule.)

The Load Bias in MetaFrame Presentation Server 3.0

In MetaFrame Presentation Server 3.0, Citrix added a new registry value called “ForceRegBias” in the HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\IMA\LMS key that would force the load evaluator to use the load bias you entered into the registry regardless of what rules you used.

While this helped, it was still far from ideal.

The Load Bias in Citrix Presentation Server 4.0

Nothing changed with the release of Presentation Server 4, but Hotfix Rollup Pack 1 (HRP01) for PS 4 introduced another load bias modification that Citrix called  “Slow-Start Load Balancing.”

The idea behind slow-start load balancing was that Citrix would officially give logons a really high load bias. How high? It depended on what the current load index of the server was. The data collector would create a load bias that was half the remaining load for logons! The formula is:

Temporary slow-start load bias = Current Load + 1/2 of (10000 - Current Load)

In other words, a server that had a load index of 4000 would have a temporary slow start load index of 7000. The load bias in this case was half of the remaining headroom (or half of 6000, since the server had a load index of 4000 out of a max of 10,000). The load bias (3000) is added to the original load (4000) to come up with the 7000 temporary load index.

Temporary load index = 4000 + 1/2 of (10000 – 4000) = 4000 + 1/2 of 6000 = 7000

The process continued for as many simultaneous logons as there were. If that server with a load index of 4000 got a single logon, the load index increased by half the headroom to 7000. If a second simultaneous logon occurred, that server would see it’s load index increase again by half of the remaining headroom, taking it up to 8500. A third simultaneous logon would increase it again by half the remaining headroom, bringing it up to 9250, and so on.

The key here is that these are just temporary increases. Once the users are fully logged on, the server will recalculate its “real” load index and send that value up to the data collector. The only reason these really high temporary load indexes exist is to prevent one server from getting a ton of new logons and essentially rendering an entire farm unusable.

By the way, you can disable this slow start stuff by setting the key HKEY_LOCAL_MACHINE\Software\Citrix\IMA\LMS\UseILB to 0.

The Load Bias in Citrix Presentation Server 4.5

Citrix Presentation Server 4.5 allows you to further tune the load bias calculation performed by this slow-start behavior. This tuning can be done as via a new rule called “load throttling” that you’ll see when configuring a load evaluator in a PS 4.5 farm.

If you enable the load throttling rule, you’ll see a drop-down box called “impact of logons on load.”

  • Extreme
  • High (default)
  • Medium-High
  • Medium
  • Medium-Low

  

Behind-the-scenes, this rule affects the mathematical formula that’s used when calculating this slow-start load bias. Remember how after HRP01 on PS 4, the slow-start load bias will always be one-half of the remaining headroom? Well these load throttling numbers allow you to change that. Now instead of new connections consuming one-half of the remaining load, you can configure them to be one-third or one-forth or one-fifth (or even 100%) of the remaining load.

Mathematically, these load throttling values allow you to change the denominator in the slow-start load balancing algorithm. Recall from above that the formula is:

Temporary Slow-start load = Current Load + 1/2 of (10000 - Current Load)

Changing the values of “Logon Impact” allows you to change the bottom part of that “1/2” multiplier into some other value, from 1/1 to 1/5.

The values are as such:

  • Extreme: 1
  • High: 2
  • Medium-High: 3
  • Medium: 4
  • Medium-Low: 5

Since these values affect the bottom halves of the fractions, the higher impact choices are lower numbers, meaning they lead to higher slow-start load biases. (By the way, it should now be clear mathematically why the default value is 2, since the default behavior is that the load bias is one-half of the remaining headroom.)

But if you feel that adding one-half the value of the remaining load is too much, you could choose “Medium-High,” which would switch the multiplier to one-third, meaning that each logon would only add one-third of the remaining headrom for each logon instead of one-half.

Similarly, a value of “extreme” means that your multiple would be 1/1, or just “1.” This in effect means that when a user logs on, the temporary load index will always be 10,000, since the algorithm will add the entire remaining value to whatever the current load is. A value of “extreme” means that only one person can logon at a time. (Your server can still support dozens or even hundreds of users. The “extreme” value just means they will connect one-by-one.)

Join the conversation

29 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Thanks for that Brian! Its nice when you can see the progression Citrix has made from near nothing to a fairly advanced equation. Good work.
Cancel

We've used load evaluators based on user and CPU load for some time now, and this has worked really well for us.  We have terminal servers that typically house around 35 users, and can put a new server in the farm, and logons don't flood onto the server to cause the black hole effect.

Cancel

It's very interesting for me to compare the Load Evaluator mechanism used by CPS to the PowerTerm WebConnect Load Balancer. As is the case in most every aspect of the PowerTerm WebConnect architecture compared to CPS, a much more simple layered design was used. In the context of the Load Balancer this means that a small Load Balancer Agent is installed onto the Terminal Servers to collect load data. This data is transmitted to (one or more) central Load Balancing Server that actually performs the server selection process. As a result, the Load Balancer Agent is very small, approximately 300KB, and places a negligible load on the Terminal Servers. Another advantage is that almost all configurations of the server selection process are done on the central Load Balancing Server, and each configuration change is instantly applied across the entire farm.

PowerTerm WebConnect also provides a mechanism for logon throttling and overcoming the "black hold" effect which I'll describe on my blog. For more information of the architectural differences between CPS and PowerTerm WebConnect see my BriForum presentation.

Dan

Cancel

I forgot to mention another benefit of the PowerTerm WebConnect architecture: a single PowerTerm WebConnect Load Balancer can handle over 4,000 Terminal Servers. This was actually verified by one of our customers.

Dan

Cancel

Hey Dan,

How is this different than Citrix's architecture? In the case of Presentation Server, the selection process and all that is also performed by a central server (the zone data collector). The data collectors receive the load indexes from each server and they're the ones that perform the selection process. In the case of Presentation Server also, all configuration is done centrally (even the registry keys mentioned in this article are only for older versions of Presentation Server).

Cancel

You are correct Brian, in both cases the selection is performed by a central server (BTW the new LH Session Broker is different in that it also supports a model in which any TS can make the selection, though based on the same data). The difference is where the calculation is performed, and consequently, what data is transmitted. In the case of CPS I understand that the calculation is performed on each server and the data transmitted is the result of the calculation. In the case of PowerTerm WebConnect the raw data is transmitted and the calculation is performed centrally. While this difference is not that significant in the context of Load Balancing, it is indicative to the overall difference in architecture between the two products.

I obviously do know that CPS is centrally managed and configured. Its just that in reading your description of the Load Bias mechanism I got the impression that that aspect was locally configured via the registry. Thank you for clarifying this point. BTW does Citrix provide any guidance to help determine which "Logon Impact" setting to use? Also, how often do you find yourself customizing the Load Evaluator for a farm on a server-by-server basis?

Dan

Cancel

I've just posted a description of how Logon Throttling works in PowerTerm WebConnect on my blog.

Dan

Cancel

I think that the average number of users that people are seeing per-server is going to have to increase in order for this equation to be effective. 

Current Situation:

In an environment with an average user load of 3500 the evaluator is set at 1/2 as a new server comes online.  Load is set at 5000 (0+ (1/2 (10,000 - 0)).  New users are directed to a different server for logins. 
Result: only 1 user at a time can log in

New _least sensitive_ Situation:

Same environment, with the load evaluator set to 1/5 (the least sensitive).  Load is set at 2000 for the first login (0 + (1/5 (10000-0)), then 3600 for the next login (0 + (1/5 (10000-2000)). 

Result: Two logins are tolerated at once. 

 My only point is that until we actually see 80 users per server, the options between 1 and 5 are really moot points.  Personally, I haven't found many servers that can keep up with 50+ users at a cost-effective price point.  Opinions?

Cancel

Hi Ben,

I don't understand where your numbers are coming from? If the starting load is 3500, that means the remaining load is 6500, and 1/2 of that is 3250, so the load after one user is 3500 + 3250 = 6750. Then if a second simultaneous user logs in, the starting load would be 6750, meaning the headroom is 3250. Half of that is 1625, which when added to the original load of 6750 you get 8375.

My point is that the load will never hit 10,000 (unless it's set to 1/1, or "extreme,"), it will only approach 10,000. So a ton of users hitting your farm at the exact same time will always be let in.. even 100 users connection to 5 servers at once.

The only real use for this is when you have one server that is WAAAY less loaded than the others.. like if it reboots randomly.

Cancel

There is another issue. In your first scenario if you have 9 fairly load servers and a new server comes online, and then 10 users logon at once each will be directed to a different server. Now you might say that this is good: the logon load is evenly distributed. But what about after the logon is done? Your new server is still hardly loaded while the other servers are even more loaded than before. As more groups of users logon it will eventually even out, but only when the servers are approaching their maximal capacity. Or am I missing something?

Dan

Cancel
it sounds like the VMware Distributed Resource Scheduler concept is needed for Citrix here....unless there is something already in place in the architecture I am completely unaware of 
Cancel

Brian,

I might be getting this completely wrong, but how can you realistically find yourself facing “the black hole effect.”, unless you have not planned for spare/additional capacity in the farm (I am talking about the number of servers you provision for your production farm) , to allow for things like servers down for maintenance, busy logon times, unforeseen emergencies etc?

Assuming you know the number of MAX concurrent users you can get at any time in your environment, it appears insane to allow yourself to be facing this situation, unless of course you work for organisations that don't plan for redundancy and/or don't make resources available where they should? (I know we don't live in a perfect world also)

 Mike

Cancel

Michael,

Please read Brian's description of the "black hole" scenario carefully. If you do you will see that unless some sort of logon throttling mechanism is put in place, this problem can happen almost regardless of the amount of spare capacity that you have. Even if you have double the capacity that you need your load balancer will distribute the users so that there is some load on all your servers. Now if you bring a new server online this server has zero load. Because zero is less than some, without logon throttling all new users will be directed to that server until the load balancer notices the increased load on that server. And if your load evaluator is only session based, and does not take CPU and such into account, the load balancer may not even notice that increased load at all.

For this reason, you have a larger farm in place you need to make sure that the SBC solution you use supports logon throttling and that it is enabled and properly configured.

Dan

Cancel

Hey Brian,

I'm really only talking about the "random reboot" scenario, where a server that starts with a 0 load level compared to other servers with load levels of around 3500.  If the load evaluator multiplier is set at 1/2, even 1 user logging in will set the new server's adjusted load level to 5000, causing user logins to be directed to other servers in the farm.  (0 + (1/2 (10,000 - 0)) = 5,000

If the load evaluator multiplier is set at 1/5 in the same scenario, the load of 1 user logging in would be 2000, then 3600 for the next user.  (0 + (1/5 (10,000 - 0)) = 2,000, then (2000 + (1/5 (10,000 - 2,000)) = 3,600.  

I agree with you that this is a useful feature, but the adjustments between 1/2 and 1/5 are only usable if you're using the "scale up" approach to building PS farms.

Cancel

Thanks for the clarification Dan,

I need to spend a bit more time digesting it :)

Michael

Cancel
Hmm.. That's a good point. But it depends on how you set this multiplier. You could very easily make it so that of your 10 new logons, maybe 5 go to the one server and the 5 five are distributed around to the rest. And in this case, you have to make the decision as to whether you want your "other" servers more loaded, or users denied because the one logon server is too busy.
Cancel

I see your point Ben, and I agree to some extent. However, also keep in mind that you can somewhat control what general operating range your load indexes will be in. For instance, if you build a rule that says max users equals 1000, then if your environment has only 40 users per server, your average load index will only be 400, and your scenario is exactly right. However, if you lower your max value in your rule, this might not have the same impact.

But I do agree with your point. It would almost be cooler if this could somehow take into account to current average load in the silo, instead of basing it on the difference between current load and 10,000. 

Cancel

I can definitely see a usefulness for this if you've lowered your max load with the server user load evaluator.  I'm only using the cpu, memory and disk IO load evaluators, since R01 and slow start LB at least.

Cancel

Brian my point is that you shouldn't be required to make this decision. The root of the problem is that CPS does load throttling by modifying the load bias. This makes the number of sessions directed at the new server also dependant on the load on the other servers in the farm. This is patently incorrect because the number of simultaneous logons a server can handle has nothing to do with the load on the other servers. For this reason I prefer the method we use in PowerTerm WebConnect where this dependency does not exist.

Dan

Cancel
Moving users, sessions intact, to anoher Citrix server is the holy grail of Citrix computing. It solves so many problems and gives us so many useful options that I doubt it is even possible. :P
Cancel

Part of the confusion is that the effect described here is only a subset of the "Black Hole Effect". This one was a problem of arciteture, so easier to solve. The real "Black Hole Effect" is the situation where a server's IMA service has "half way" crashed. It reports itself to the ZDCs as available, but the IMA service won't take any connections.

 I see this issue far less than I used to, but it would be a shame if we allow Citrix to redefine this bug with this one subset of the problem.

Cancel

For example, I have a farm where due to an issue with an published application, we had to resort to a load evaluator that takes number of users into account only (while using CPU as an evaluator, we began to have issues because servers started reporting 10000 load with 3 or 4 users taking up all available CPU from server.... talk about buggy application).

Anyways, currently I'm using an ILB modifier of 3, so I see (and by using the formula it checks out) that a throttling load of 3333 equals to one user logging in, 5555 to 2 users, and so on. But to me it looks like that the load values on each column are not "interacting" with each other to give a net load value. Is that supposed to happen like this? From the article I got the understanding that the load values (actual and temporary) should add up, but it does not appear to be happening with my farm.

Fabio

Cancel

I had done a copy/paste from notepad and part of the post got cut off (my mistake!).

I'd like to elaborate a bit more on my scenario... My farm has 3 servers and and average of 60 users per server. Therefore I set the Users load evaluator to 100 users; that makes the load value really easy to calculate... 1000 = 10 users :). We ended up using 100 as the number of users for the evaluator to accomodate everyone on two servers in case one of them became unavailable (they appear to be able to handle that many ppl).

So what I usually see when I run qfarm /ltload is the number on the first column around the 6000 mark, and the number on the second column varying between 0, 3333, 5555, 7036 and 8023 (0 to 4 users logging in).

And the curious thing is that if I run the traditional qfarm /load command, IF the number from the load throttling is higher, it is the one that appears in the server load value. In practice it looks to me that the data collector reports the higher number between server and throttling loads.

Any ideas on what I'm missing here?

Fabio

Cancel

IN an DR situation of 1 datacenter going down completely, we will be faced with 7000 users trying to access the Other Datacenter all at the same time.  Let's assume the WEBI and XML brokers can handle the app enumeration load.  And let's also assume that the ZDC can handle the app resolution load. 

Our servers have been certified to handle 120 sessions.  If all 7000 users attempt to select the 1 application icon, the logons would  be spread out evenly, but that will kill the servers.

Is there a way to tell the ZDC to check the time of the last logon on any given server and do not send the next logon to that server within 30 seconds?

 We can live wiith making the user wait 30 seconds before gettting their ICA file...

Cancel
Citrix has the facility to limit the number of connections for one user. Our administrators have set this to 1 yet a user can logon twice. This has been explained to us a conflict between the application code trying to optimise memory use and allowing multi sessions and the facility trying to limit it. Is this true? Is there a solution?
Cancel

My experience is that logons are expensive (as Brian puts it) in terms of both CPU and disk i/0 (loading user profile and applying security policies). However, the assumption that if production servers are relatively busy, and a new server comes on-line, that all users would be directed to that server is only true if you're using the default load evaluator (which many admins do). A load evaluator that only looks at user session count is not very intelligent. That's how you crash servers or at the very least run into logon denials because the XML service or license verification process times-out. At a minimum, CPU and disk have to be calculated in evaluating the load. Having said that, as users logon to this new server with "0" load, its load will go up exponentially as a number of users logon. The DC will then direct subsequent logons to other servers in the farm who evaluate to a lesser load.

If an application pegs a server with a more advanced load evaluator applied, the answer is not to go back to the default evaluator. That's playing the ostrich, and your User community will feel the pain. Applications that hog resources badly are probably not good candidates for Terminal Services. If politically, that fight is lost, then consider Silo-ing. Configuring an Isolation Environment won't help in that scenario.

I once had an application that worked with the BlackIce printer driver (to do screen grabs of certain fields on the screen and produce a .pdf) that after invoked, would stay resident and consume 24% CPU. This was not discovered during the testing phase. Needless to say, the only real solution was to migrate that application to local PC's.

Citrix has come up with a couple of new tools like load throttling and Edgesight. The truth is that any good server admin can identify an application or process that's causing the headaches - using built-in tools. Mitigating the phenominon is the challenge. The main problem I see is that the counters are "long in the tooth". For instance, the Pages/sec and Context Switches/sec default values were established for 533MHz class servers with 32MB RAM! Citrix should figure out how scale the default values based on parsing the Sysinfo of the machine it's being installed on. In the meantime, we have to adjust (read:SWAG) the counters based on establishing a baseline for a given hardware config. Lazy on their part for such a mature product.

Cancel

Sam, it's thanks to buzz that I noticed your response to my old comment.

> A load evaluator that only looks at user session count is not very intelligent

I couldn't agree more, which is why I wrote a post in my blog last year titled "Counting Sessions Is Not Enough". Also the fact that this a default that can be changed is not good enough IMO because, as you point out many admins just stick with the default.

> Citrix has come up with a couple of new tools like load throttling and Edgesight

I agree that load throttling is important which is why we've had this feature built into our load balancer from the very first. Good performance monitoring tools are also critical and so we've bundled RTO PinPoint, which I believe is the best application performance monitoring solution for SBC in the market today.

Cancel

Thanks for replying Dan. Just caught that. I hate chiming-in on old posts, but this was a good one (and excellent article). Sometimes it's good to understand features to this degree, but if the vendor provides a tool that works, I try not to circumvent it. My experience thus far has been with CPS XPa and MPS 4.0e. the "Load Throttling" feature didn't come out till 4.5. IMO, A really big part of the key to keeping the logon process as lean as possible is to use mandatory profiles that have been put on a "Miami diet" (mine were ~256k in size) and, to ensure your Group Policies are just what is needed and not redundant (another "your-on-your-own" process) I found the GPMC and RSoP snap-in's to be invaluable tools.

 I must say that in the scenario's discussed, I did experience these very issues (esp. after a server bounced) - 'till I created my own load evaluators based on real indicators of server load. Still, as I said, the counters had to be modified as the Resource Manager was reporting full loads (on certain counters) when they did not exist. 

I understand that Citrix customers might install this product on just about anything, but if the installer were more intelligent, it could set these values based on class and # of CPU's, page file size, and system disk bus speed. That would be most helpful, or at least a better starting point.

Cancel
Whether ILB works with Citrix Xenapp 4.5 Advanced Edition
Cancel

-ADS BY GOOGLE

SearchVirtualDesktop

SearchEnterpriseDesktop

SearchServerVirtualization

SearchVMware

Close