Troubleshooting Slow Logons - Terminal Services for Microsoft Windows Server 2003, Advanced Technical Design Guide - BrianMadden.com
Brian Madden Logo
Your independent source for desktop virtualization, consumerization, and enterprise mobility management.

Troubleshooting Slow Logons

Written on Aug 09 2007 20,571 views, 1 comment


by Brian Madden

All of the problems mentioned previously can be annoying, but slow logons seem to be the curse of so many Terminal Server environments. There are plenty of real-world situations in which people are completely happy with their servers except for the fact that logons are too slow.

What exactly is a "slow logon?" It depends on your environment. Logging on to a Terminal Server and establishing a new session will never be as fast as connecting to a previously disconnected session. Some companies have logon processes that complete in a few seconds, while others take a few minutes. Unfortunately, there are some environments in which the logon process takes several minutes, and even 20–30 minutes is not unheard of. This is the situation that we will troubleshoot here.

Understanding the Terminal Server Logon Process

The logon process in Terminal Server environments is interesting, especially since it doesn't necessarily relate to the usability and overall feel of the server in general. As you've learned thus far, a series of events happen on a Terminal Server from the time a user clicks the "connect" button until the time his application or desktop is presented to him. In order to successfully troubleshoot slow logons, you need to fully understand what happens during the logon process. Only then can you begin to investigate which aspect of the logon process is holding things up.

Building on what was covered in earlier chapters, the following sequence of events occurs when a user logs on to a Terminal Server:

  1. The user clicks the connect button.
  2. In session directory-enabled Terminal Server 2003 environments, the server routes the user to the server that is hosting the user's disconnected session. (See Chapter 7 for details.)
  3. The server negotiates with the requesting client for its encryption level and virtual channel capabilities.
  4. The user is authenticated to the domain and his rights are checked for access to the connection.
  5. The licenses are verified. The server client access license is verified first, and then the Terminal Server client access license is verified. (See Chapter 4 for details.)
  6. The Terminal Server determines the user profile type and loads one if necessary. (See Chapter 6 for details.)
  7. The Terminal Server applies any GPOs that have been configured. (See Chapter 6 for details.)
  8. The Terminal Server launches any applications as specified in the policies. (See Chapter 6 for details.)
  9. The server executes the contents of the "run" registry values.
  10. The server runs the user's logon script(s). (See Chapter 6 for details.)
  11. The server runs any programs in the Startup folder of the user's Start Menu.

The above steps take place each time a user logs on. If you're using Citrix MetaFrame, then you can add more steps to address local time zone estimation and load calculations.

Now that you've seen what happens (or could potentially happen) each time a user logs on, you can start to trace this process in your environment to pinpoint where the delay could be.

Here's the approach that most people take, beginning with the easy steps and progressing to the more difficult ones:

  • Isolate the problem.
  • Check the roaming profile.
  • Check for anything that runs, executes, or is loaded when users log on.
  • Check for any other actions that take place when users log on.
  • Trace the logon process with server debug logging options.

Step 1. Isolate the Problem

Before you can start troubleshooting the performance of the logon process, you should get a feel for the situations in which slow logons occur. To do this, collect as much information about the symptoms of the problem as you can. Some potential questions to ask include:

  • Are logons slow for some users or all users?
  • If you're using Citrix MetaFrame, does a user with a slow logon via ICA also experience a slow logon via RDP?
  • What happens if a user with a slow logon logs in via the server console?
  • Are logons slow at all times of the day, or just sometimes?
  • Do users experience slow logons every time, or is it sporadic?
  • Is there anything else that the slow logon users have in common? Are they all in the same domain group or OU? Are they all in the same building? Do they all have the same type of client device or desktop image? Are they all accountants?

Answering these questions will help you frame your investigation. For example, if you determine that slow logons only occur via ICA and not via RDP sessions, then you'll be able to focus your efforts on the components of the ICA session startup process that differ from the standard Terminal Server startup process.

Once you've isolated the symptoms of the problem, you can move to Step 2 with the information you need to make an intelligent diagnosis.

Step 2. Check the Roaming Profile

Although checking roaming profiles doesn't logically seem like the best place to start, roaming profiles are probably responsible for 95% of all slow logon issues in Terminal Server environments. The proper use and design of profiles in Terminal Server environments covered in Chapter 6. If you're not using roaming profiles, skip directly to Step 3.

If you are using roaming profiles, check to see how big your users' profiles are. In theory, Windows 2003 should not allow the master copies of your users' profiles to include space-wasting items such as temporary Internet files. In practice, however, roaming profiles can contain all sorts of garbage, including temp files, Internet cache, crash logs, and hundreds of files that start with "~."

Remember that in environments with roaming profiles, the entire roaming profile needs to be copied across the network to the Terminal Server each time a user logs on. (The only exception to this is if the user logs onto the same server they last logged out of and a cached copy of the roaming profile is locally stored on that server.)

If you do have huge roaming profiles, then you'll need to make them smaller. By using today's techniques of folder redirection, home drives, and profile folder exclusion (all covered in Chapter 6), it's possible to design an environment in which roaming profiles never grow to more than a few megabytes in size.

Step 3. Identify Anything that Runs when a User Logs On

If you determine that huge profiles are not causing slow logons, you should next identify everything that runs when a user logs on. In Terminal Server environments, you must check several places.

Check the Logon Script

Remember from Chapter 6 that users can get logon scripts from several locations. In addition to a script applied as part of the user account settings, you can also have scripts applied as part of GPOs. Finally, don't forget that the usrlogon.cmd file still exists (even in Windows Server 2003), and it's possible that some processes are being launched from there.

Check the Registry

There are several registry locations that can launch processes when a user logs on. On your Terminal Servers, check the following registry keys:

Key: HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\

Value: AppSetup

Type: REG_DWORD

Data: This is a comma-separated list of executables that run at session startup.

For most servers, this list will contain UsrLogon.Cmd. (Don't forget to check UsrLogon.cmd to see what it also might be adding to the logon process.) If you're using Citrix MetaFrame, this registry value will also include "cmstart.exe." This is the program that configures the post-logon Citrix environment, including drive mapping and printer redirection.

Key: HKCU\Software\Microsoft\Windows\CurrentVersion\Run

Value: The name of a program to run.

Type: REG_DWORD

Data: The path of the program to run.

Key: HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run

Value: The name of a program to run.

Type: REG_DWORD

Data: The path of the program to run.

These two "run" keys are often used by software vendors who want to launch something at startup without giving the user the option of canceling it. This is how most of those annoying icons appear in your system tray. Each of these keys might list several programs in your environment. Just be sure you know what each one is.

Check the Startup Folders

Many people don't think to check the standard Windows Start Menu's Startup folders (both the user's folder and the all users folder). However, you'd be surprised at how many users manage to download garbage that installs itself and automatically launches via these folders.

Technically speaking, the contents of the Startup folders are launched after logon, since the Windows shell must be up and running first. However, these applications can still significantly slow the perceived logon time for your users.

Dealing with Programs that Run at Logon Time

Now that you have a list of what runs when a user logs on, what should you do? First, see if you can figure out what everything is and whether any one item is taking a long time to run. Since this is a Terminal Server, this is easy to do. Log in as an administrator and launch Task Manager or the Performance Monitor MMC snap-in. Then, log in as a user and observe the process names that are running for that user.

What should you do if you identify something that's taking too long? Delete it. If the program is required, however, there are still some tricks you can use. The best is to let the program run in its own window that doesn't affect the window. In essence, you can turn any program into a background process.

For example, the cmstart.exe application from Citrix has been known to take a long time to start, especially if users have many printers and printer mapping is enabled. Since the cmstart.exe program is invoked each time a user logs on via the "AppSetup" registry key we discussed previously, the user's session cannot start until cmstart.exe successfully completes. To combat this, delete cmstart.exe from the "AppSetup" registry location and add it to a user's logon script. If you add it to the usrlogon.cmd script (which ironically is also invoked via the "AppSetup" registry key), it will run for each user that logs onto the server.

Instead of adding a line to the logon script that simply reads "cmstart," add the following line:

start cmstart.exe

The "start" command will launch this process in its own command Window, allowing the system to move on without waiting for it to finish. You can use "start" to launch any application from a logon script with the confidence that the application won't slow down your logon process.

The key here is that the "start" method of launching programs won't make them run any faster, but it will at least let them run in the background so that your users can start working sooner.

Step 4. Identify Other Activities that Take Place at Logon Time

If you can't find a program that's slowing things down by running at logon time, then you'll need to make a list of everything else that happens at logon time.

To do this, check the configuration of all your programs. Look at what's happening with the RDP virtual channels. Do you have drive mapping enabled? How about printer mapping or port redirection? All of these options must initialize themselves when a new session starts.

You're probably wondering how much this can really affect the performance of the system. Consider these facts: disabling Citrix MetaFrame's client time zone estimation at startup allowed one company to increase their server loads from 65 to 110 users per server. Another company was able to increase from 35 to 62 users per server simply by disabling the automatic printer mapping since printer redirection takes significant time at logon as new printers are detected and installed by the spooler service. (See Chapter 4 for details.) Disable that feature or check the box that allows users to logon without waiting for their printers and watch your logon times drop.

This is not meant to be an exhaustive list of everything you should try to disable. Rather, these examples should give you some ideas of what to look for and examples of the dramatic affects they can have on logon times.

When considering the activities that take place at logon time, don't forget about non-Terminal Server and applications. For example, antivirus software is notorious for slowing down the logon process of a Terminal Server as it scans every file of the user's profile as it's loaded. Running virus software on your master profile file server instead of your Terminal Servers has great potential to substantially decrease logon times.

Step 5. Trace the Debug Logs from the User Logon Process

If after these four steps you haven't determined why you're experiencing slow logons, it's time to get your hands dirty. In Windows Server 2003, an application called "userenv.dll" is responsible for creating the entire user environment at logon time. This includes loading user profiles and applying GPOs.

Fortunately, you can enable diagnostic trace logging on all actions that the userenv.dll file conducts. These trace logs are extremely detailed, describing down to the millisecond exactly what the server was doing. For example, in addition to being able to trace the high-level logon process outlined previously, you can also see the userenv.dll verifying the profile file list build, checking for disk space, verifying ntuser.dat, loading ntuser.dat into HKCU, and replacing system variables in the path with actual variables.

By viewing this log you will see if a particular file in the roaming profile is getting stuck, if the file copy process is taking too long, or if DNS or WINS name resolution is holding the server up. It also allows you to track the application of GPOs to see if one is taking an inordinate amount of time or if you simply have too many GPOs that are slowing things down.

This log file can help you pinpoint slow logon issues related to things you might not have thought of otherwise. For example, one environment was experiencing slow logons due to the domain controller. The IT staff had been focusing their attention on the path between the Terminal Server and the file server hosting the master copies of the user profiles. However, when they looked at the userenv.dll logs, they discovered that the domain controller was so busy that it was taking 30–45 seconds to respond when the Terminal Server queried it to see where the user's roaming profile was stored. A $3,000 hardware upgrade to the domain controller saved this company 45 seconds on every user logon across 20 Terminal Servers.

You can enable userenv.dll logging by adding the following registry entry to a Terminal Server:

Key: HKLM\Software\Microsoft\Windows NT\CurrentVersion\Winlogon

Value: UserEnvDebugLevel

Type: REG_DWORD

Data: 10002 (Hex)

The data value of 10002 will enable verbose logging to a file on the server. Once you set this value, reboot your server and check for a "userenv.log" file in the %SystemRoot%\Debug\UserMode\ folder. Remember to turn this off when you're done troubleshooting it, since each user logon can easily add 100KB to the size of this log file.

Let's look at a small example from a userenv.log file. This example has been severely trimmed, and shows only a few random lines to give you a feel for the type of information that is available in this log file. (The complete log for this single user's logon process would have filled 25 pages.)

Notice that the exact time down to the millisecond is listed for every line in this log. This allows you to see exactly what's happening and where the hold-up could be.

 

09:25:30:606 UnloadUserProfile: Entering, hProfile =
<0x850>
09:25:30:606 UnloadUserProfile: In console winlogon
process
09:25:30:616 UnloadUserProfileP: Entering, hProfile =
<0x850>
09:25:30:616 CSyncManager::EnterLock <S-1-5-21-
2364083253-1420309831-852573094-500>
09:25:30:616 CSyncManager::EnterLock: No existing en-
try found
09:25:30:616 CSyncManager::EnterLock: New entry cre-
ated
09:25:30:616 CHashTable::HashAdd: S-1-5-21-2364083253-
1420309831-852573094-500 added in bucket 19
09:25:30:616 UnloadUserProfileP: Wait succeeded.   In
critical section.
09:25:30:872 MyRegUnLoadKey: Returning 1.
09:25:30:872 UnloadUserProfileP: Succesfully unloaded
profile
is local, not setting preference key
09:27:39:975 CreateLocalProfileImage: One way or an-
other we haven't got an existing local profile, try
and create one
09:27:39:975 CreateSecureDirectory: Entering with
<C:\Documents and Settings\brian>
09:27:40:052 CreateSecureDirectory: Created the direc-
tory <C:\Documents and Settings\brian>
09:27:40:052 ComputeLocalProfileName: generated the
profile directory <C:\Documents and
09:27:42:068 CopyProfileDirectory: Leaving with a re-
turn value of 1
09:27:42:106 MyRegLoadKey: Returning 00000000
09:27:42:106 IssueDefaultProfile: Leaving success-
fully
09:27:42:115 RestoreUserProfile: Successfully setup
the local default.
09:27:42:115 SetupNewHive: Entering
09:27:42:115 SetDefaultUserHiveSecurity: Entering
09:27:42:335 SecureUserKey: Entering
09:27:42:335 SecureUserKey: Leaving with a return
value of 1
09:27:42:335 SecureUserKey: Entering
09:27:42:345 SecureUserKey: Leaving with a return
value of 1
09:27:49:719 ProcessGPOs: -----------------------
09:27:49:719 ProcessGPOs: Processing extension Micro-
soft Disk Quota
09:27:49:719 CompareGPOLists: The lists are the same.
09:27:49:719 ProcessGPOs: Extension Microsoft Disk
Quota skipped with flags 0x6.
09:27:49:719 ProcessGPOs: -----------------------
09:27:49:719 ProcessGPOs: Processing extension QoS
Packet Scheduler
09:27:49:719 CompareGPOLists: The lists are the same.
09:27:49:719 ProcessGPOs: Extension QoS Packet Sched-
uler skipped with flags 0x6.
09:27:49:719 ProcessGPOs: -----------------------
09:27:49:729 ProcessGPOs: Processing extension Scripts
09:27:49:729 CompareGPOLists: The lists are the same.
09:27:49:729 ProcessGPOs: Extension Scripts skipped
because both deleted and changed GPO lists are empty.
09:27:49:862 ProcessGPOs: User Group Policy has been
applied.
09:27:49:862 ProcessGPOs: Leaving with 1.
09:27:49:872 ApplyGroupPolicy: Leaving successfully.
09:27:51:763 IsSyncForegroundPolicyRefresh: Synchro-
nous, Reason: policy set to SYNC
09:27:52:700 LibMain: Process Name:
C:\WINDOWS\system32\userinit.exe
09:27:53:139 GPOThread: Next refresh will happen in
109 minutes
09:27:57:926 LibMain: Process Name:
C:\WINDOWS\system32\ie4uinit.exe
09:28:40:507 LibMain: Process Name:
C:\WINDOWS\system32\msiexec.exe
09:28:47:967 LibMain: Process Name:
C:\WINDOWS\system32\userinit.exe

Figure 13.1: Selected sample lines from a userenv.dll log

With these tools and techniques, you should be equipped to troubleshoot slow logons. Keep in mind that the logon process always takes time, and in reality 15 or 20 seconds is probably as fast as you'll ever get it. However, if your current users are waiting a minute or two, there are steps you can take to speed up the process.

 

 
 




Our Books


(Note: You must be logged in to post a comment.)

If you log in and nothing happens, delete your cookies from BrianMadden.com and try again. Sorry about that, but we had to make a one-time change to the cookie path when we migrated web servers.

Trackbacks

Server 2012 – TerminalServices Event ID 20499 | IT-Notes wrote Server 2012 &#8211; TerminalServices Event ID 20499 &#124; IT-Notes
on Wed, Nov 13 2013 2:42 AM

Pingback from  Server 2012 – TerminalServices Event ID 20499 | IT-Notes