This article provides an indepdent view of TScale's features and explains how they work. (Full Disclosure: RTO Software has sponsed BrianMadden.com in the past.)
Why Terminal Server Applications or not "Optimized" by Default
In order to truly understand how TScale's application optimization works, you need to first understand how Windows applications use memory and the page file. (For more information about how applications use memory in Terminal Server environments, download my free 38-page whitepaper "Terminal Server Performance Tuning.")
Simply put, TScale's application optimization capabilities function by changing the way that applications use memory and the page file. A common misconception about the Windows page file is that it's "merely" an extension of physical memory used on computers that don't have enough memory. Most people think that if they buy enough physical memory, they'll never have to worry about the page file. In Terminal Server environments, nothing could be further from the truth. While it's true that the page file is used more when physical memory is scarce, Windows also uses the page file when memory is plentiful.
Windows is smart enough to only load a single copy of a binary executable into memory even when multiple processes (or users) share a single application. Windows utilizes "copy-on-write" optimization that will make an additional copy of a portion of the application in memory only when a process attempts to write to it. However, in Microsoft Windows environments, every executable or DLL is modified in memory as it's used. (This doesn't mean that the EXE or DLL files on the disk are written to. It simply means that once they're loaded into memory, the versions in memory change as they are used.)
For example, multiple instances of a DLL loaded into memory share the same memory ares. However, as soon as a process tries to write to a portion of that DLL, the system makes a quick copy of it (via the "copy-on-write" functionality) and lets the process write to the copy instead. Additionally, the system also backs up that section of that DLL to the page file for safekeeping. This means that there are effectively three copies of that portion of the DLL in memory-the original, the copy for the other process to write to, and the backup in the page file. This same phenomena occurs for every program that is shared by multiple process, including EXE and DLL files.
In regular Windows environments, backing up the copy-on-written section of an executable to the page file is no big deal. However, imagine how inefficient this is in Terminal Server environments!
Think about a Terminal Server hosting 30 users who are all using a fat client application such as JD Edwards One World. This application is a standard client / server application, and launching the JD Edwards client software loads an executable and several DLLs into the memory space of each user's session. However, Windows only initially loads a single copy of the executables into physical memory.
As all 30 users utilize the application, Windows' copy-on-write optimization creates 29 "copies" in memory of large portions of each JD Edwards executable. (One for the first user and 29 copies for each of the 29 other users.) This means that Windows will have also placed an additional 29 copies of the original executables in the page file before the copies were made, ultimately meaning that 30 JDE users will cause 59 copies of the single executable to be loaded in memory. Now, imagine this multiplied by each of the many EXEs and DLLs that JD Edwards loads!
Unfortunately, this is all too common. These fat client / server applications were never really designed for Terminal Server environments. Applications like JDE OneWorld, Cerner, Lotus Notes, Siebel, PeopleSoft, SAP, and others all load massive client environments when they're launched. (This usually includes the core EXE plus several DLLs.)
Obviously, this copy-on-write behavior is not very efficient. It seems like there should be something you can do to change this behavior. Unfortunately, Windows' copy-on-write "optimization" is part of the core Windows memory management components, and you can't just turn it off. As Kevin Goodman (RTO's CTO) puts it, "it's not like there's a 'NoCopyOnWrite' registry flag that you can use to disable it."
TScale's Application Optimization
When installed on a Terminal Server, TScale watches how applications use the memory they've been allocated and how multiple instances of an application are negatively affected by the Windows copy-on-write optimizations. It logs potential optimizations to an optimization map file on the server's hard drive. Then, the next time a user launches the application, the server reads the optimization map.
This optimizations stored in the mapping file allow multiple instances of an application to share the backup copies in the page file. This dramatically cuts down on page file usage, which in turn frees up the processor to support more users. TScale also increases the efficiency of how applications use physical memory, freeing up memory that can allow you to support more users. Each application on a Terminal Server is analyzed separately, and each has its own optimization map.
TScale really shines with the big client / server applications. In fact (and quite ironically), the most common applications that TScale doesn't affect too much (maybe 10% more users instead of 30% more) are applications from Microsoft, such as Office, Visio, and Project. (It's almost as if the folks writing these applications in Redmond know something about they way Windows works that no one else does.)
TScale's Application Shaping
While the application optimization components of TScale are powerful by themselves, Terminal Server administrators have also long struggled with applications that spin out of control and cause a server to perform poorly. The new application shaping capabilities of TScale 3.0 allow you to finely control how applications use CPU resources. TScale's addresses application shaping in two different ways: application priority and application affinity.
TScale's application priority capabilities allow you to dynamically adjust the priority of an application when it starts to run away with CPU resources. TScale implements this functionality by controlling the priority of individual application's processes. This, in turn, allows you to specify which processes should be able to preempt other processes.
With TScale 3.0, you can create rules and policies based on such factors as what CPU threshold priority is invoked, for how long the application has to be at that level, which users are subjected to or exempted from shaping, and during which hours shaping is in effect. When a rule kicks in, TScale reponds by dynically increasing or decreasing the priority of the affected process. (This priority setting is the same setting that you can set manually via task manager by right-clicking on a process.)
RTO feels that the priority-shifting method of rogue application control is much preferred over "clamping" methods that limit a particular process to a certain percentage of CPU time.
I asked Bernd Harzog, RTO Software's CEO, about why they chose to implement process-based priority settings instead of CPU clamping. He responded, "First of all, there is nothing wrong with a process using what is left over after all of the well behaving processes have gotten what they need (the rogue process simply gets what would have gone to the idle process which is a wasted resource). It is also somewhat arbitrary (and difficult) to pick the right level for each application (and better to let the OS do this dynamically). Finally, it is really intrusive to the operation of an application to suspend the threads of an application long enough to force its execution down to a specified level of CPU utilization (not withstanding the fact that doing this uses up a lot of CPU resource all of itself). So, for these reasons we have implemented Priority and not Clamping, and think we have done the better thing for the customers."
TScale 3.0's application affinity capabilities allow you to specify which CPU (in a multiprocessor or Hyperthreading-enabled system) an application should be run on.
One of the problems with multi-processor Terminal Servers has been that application theads don't necessarily make the best decisions as to which processor they should run on. TScales' application affintiy let's you quarantine problem applications and precisely control what runs where.
TScale's optimizations have worked so well in some environments that administrators started to run into other system limits, such as Windows 2000's maximum registry size of 160MB. (In Terminal Server environments, each logged on user has their own HKCU registry hive. On large servers, hitting the maximum registry size limit prevents additional users from logging on.)
To mitigate this, RTO added registry optimization capabilities to TScale 3.0. This capability defragments the registry, therby reducing the amount of memory used by the registry in Terminal Server environments. Exactly how this will affect you depends on whether you're running Windows 2000 or Windows 2003.
In Windows 2000, the entire registry is stored in the kernel's non-paged pool. The registry size limit comes into play because the non-paged pool space is limited, and it is used for a lot more than just the registry. By compressing the registry, TScale 3.0 allows it to grow larger than it's "true" 160MB. If Windows 2000 Terminal Server has run out of registry space (an easy item to check via the Performance MMC), then enabling registry compression via TScale will let you fit more users on the server. (It also frees up memory for those users.)
Windows Server 2003 has no registry size limit (since the entire registry is no longer stored in the kernel's non-paged pool). However, the registry must still be stored in "regular" virtual memory, and enabling registry compression can free up memory for other uses.
RTO claims that enabling registry optimization leads to a 10 to 15% reduction in memory registry usage.
Pricing and Availability
RTO announced TScale 3.0 in early October, and it should be available in November. Even with all the new features, pricing is the same as TScale 2.0. (MSRP of about $500 for a single CPU server, $1000 for dual processor servers, $2000 for quads, and $3000 for 8-ways.)
Maintenance is available for about 18% per year. (Current customers who bought maintenance for their TScale 2.0 licenses are, of course, entitled to receive TScale 3.0.
RTO Software has always suggested that using TScale is cheaper that buying, installing, and managing additional servers. TScale is really becoming the de facto standard in many server-based computing environments, and the new capabilities of TScale 3.0 will certainly further its reach. However, TScale's effectiveness is driven by the particular aspects of each environment. For that reason, RTO offers a free 30-day evaluation that can be downloaded from their website at rtosoft.com.