Brian Madden Logo
Your independent source for application and desktop virtualization.
advertisement

High Pages/Sec & Disk Transfers/sec, in the Citrix XenApp / Presentation Server forum on BrianMadden.com

rated by 0 users
Not Answered This post has 0 verified answers | 73 Replies | 3 Followers

Top 500 Contributor
Points 1,097
Chris Norman posted on Mon, Mar 24 2008 2:29 PM
Hey folks
Hp DL360s G4p
MS 2003 Enterprise Server SP2
PS 4.0 Ro4
8 Gigs of ram and 3.40 mhz dual processors

We publish out the entire desktop (about 30 users to a server)

I'm investigating some stalls we seem to have every so often throughout the day. The users complain about latency where they type and nothing happens for a few seconds. I've tried everything I know to resolve this. I've got local text echo on (set on the web Interface)but it doesn't seem to be helping.

The normal resource monitor shows nothing out of the ordinary, the cpu is normally bouncing from 20 to 30% and the ram usage is rarely above 4gig.

Today I'm running perfmon and watching Pages/sec & Disk Transfers/sec
I'm seeing the page/sec jump up and down at times between 750 and 15,000 for an hour or so. Then it will calm down and sit at zero with some quick 1500 spikes. In some instances I see the Disk Transfers/Sec keeping the same peaks and dips. This seems crazy high to me. I do have the /PAE switch in the boot.ini file but I’m starting to wonder if the ram (above 4gig) is even being used. Is this normal to see during peak times during their day?

Senior Administrator (Citrix)
USI Holdings

No matter where I am i'm never where I want to be.

  • | Post Points: 140

All Replies

Top 500 Contributor
Points 650
I think it would be very interesting to invest $300 into a solid state HDD and move the page file off to it for one server in the silo and see if that helps.

Just a thought.

Mike
  • | Post Points: 20
Not Ranked
Points 235
Chris,

Thanks for the info.

I have disabled profile quotas on the terminal servers as without the proquota manager utility users don't know that their profile is over quota.

A mandatory profile probably won't fly too well here, but I will explore it.

A better solution is definitely needed.

Since I have a caching proxy server that everyone uses to go through to the Internet I am going to lower the max IE cache size to something like 1MB for Intranet traffic. That should also help.

Now there is cookies and the profiles being copied off and on to take care of.

-Ross
  • | Post Points: 5
Top 500 Contributor
Points 880
Chris

We use Notes Client version 7.02

Running PS4 on win2003 SP2

Do you have complaints from the users when they log off? Nope, Like they have to end task on something? Nope.

Did you notice if you started having this issue after windows 2003 SP2, YEP

Also, the same logoff (Idle Timeout) process spikes for the application executables occurs on other applications and not just the Lotus Notes application. We have captured it occuring on a Sybase applications, and a Medical Records System application, as well as the Lotus Notes Client.

Again, we cannot reproduce the pause, because it only occurs primarily on a server with over 30 sessions, and logoff events like an (Idle Timeout) and only has end user impact some of the time. We have also observed it for login events but it is does not occur as frequent. Maybe 1 in 50 compared to Idle Timeout events.

But it is definately happening, we have conclusively confirmed it occuring for our users, but just cannot reproduce.

And it is important to be able to reproduce it, so we can get a dump file for give MS Support.

ScottC.
  • | Post Points: 5
Not Ranked
Points 235
[quote=Mike Hancock]I think it would be very interesting to invest $300 into a solid state HDD and move the page file off to it for one server in the silo and see if that helps.

Just a thought.

Mike
[/quote]

Mike,

If you invest in a solid state drive move Documents and Settings to it...

The 15K drives we use for mirroring can most definitely handle the page caching adequately, it is the user profiles and internet cache that is causing a swamping of the number of disk transfers. Caching io is very small in comparison.

-Ross
  • | Post Points: 20
Top 100 Contributor
Points 1,837
Yeah, that's something I was kicking around in a previous post. http://www.brianmadden.com/Forum/Topic/97090 - Not a lot of takers on that subject! Obviously not a mainstream tactic to this problem. I have since found that it's not really easy. Folder Redirection Policy doesn't provide for redirection of the folders we're talking about. I did find these two articles on the "Documents and Settings" folder. http://support.microsoft.com/kb/214470/en-us, this one's so old, it refers to "Documents and Settings" as "%SystemRoot%\Profiles\" : | This one is more recent: http://support.microsoft.com/kb/236621, but essentially reads the same. The RAM drive would have to be at least big enough to contain say, 70 user profiles. I wonder if it would break folder redirection...

As far as the 15K SAS mirrored drives? I'm not so impressed. My 2850's with 10K (3-disk Ultra 320) RAID5 with the Perc4/DC (128MB Cache) perform better IMO.

I agree that network was never a problem (except WAN as it related to IE performance), but I suspect if I wereable to implement client redirection, I could open-up another can-o-worms (LAN performance. You know, certain subnets with daisy-chained $15 SOHO switches on 'em ;p)

As we run the UPHClean service, and I could see that logoff activity caused as much disk i/o as logon, I thought I could try to lessen the amount of clean-up by tweaking the IE GPO I had in place. The changes I made were to keep only three days of page history, changed "Check for newer versions of stored pages" from "Automatically", to 'Every visit", and lastly I changed the "Temporary Internet Files Folder" size to like 3MB (Default 128MB), after all, we already have a network appliance caching some of that stuff right?

In an effort to better secure the environment, I shortened the Idle Timeout and Session Timeout settings on the listeners because a lot of folks were walking away and leaving their desktops (and client info) up for all the world to see. Well, this turned out to be a big mistake as I managed to increase logon/logoff activity! So, the smarter thing would have been to shorten the Windows Screen Saver (and use the Logon Screen Saver), but keep the Idle Session Timeout and Disconnection Timeout Setting reasonably long.

Another thing I did was get rid of services and protocols we didn't need, took a second look at all GPO's in the Citrix Servers OU for conflicts or redundancies, as well as revisit all the scripts to make sure they were streamlined (no unnecessary calls to Kix32.exe). I Programed more intelligence into my drive mapping user/group share mapping script because certain individuals belong to quite a few user groups and were getting excessive printers mapped (and slowing their logon experience). Sometimes there was overlap. So, I wrote a Select Case to grab the computername (to determine user location), and only mapped printers in that vicinity. None of these tweaks fixed the problem in in themselves, but the aggregate does help to an extent.

In my judgement, until you run PerfMon on the pertinant hardware counters for at least a week (in datalog mode with reasonable intervals) you are simply taking educated guesses. When it's complete, bring it up in graph. Determine which counters were being hit and eliminate the rest. Make note of the time periods the biggest spikes occur. Go back and create a new PerfMon that looks at just those hardware counters and winlogon.exe, lsass.exe, iexplore.exe, and your published applications during the aforementioned time periods. The real challenge is when you find out that it's a legacy app that you can't get rid of, or simply "of microsoft design".



Samuel A. Rodriguez
Sr. Systems Administrator

  • | Post Points: 20
Top 100 Contributor
Points 1,837
Almost forgot, Ross makes a good point in his previous post with regard to FileMon. Once PerfMon gives you the short-list of culprits, other tools are very useful in drilling down. FileMon is now Process Monitor. I rather like Process Explorer. Very useful tool for server admins. http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

Samuel A. Rodriguez
Sr. Systems Administrator

  • | Post Points: 5
Not Ranked
Points 235
[quote=Sam Rodriguez]
Yeah, that's something I was kicking around in a previous post. http://www.brianmadden.com/Forum/Topic/97090 - Not a lot of takers on that subject! Obviously not a mainstream tactic to this problem. I have since found that it's not really easy. Folder Redirection Policy doesn't provide for redirection of the folders we're talking about. I did find these two articles on the "Documents and Settings" folder. http://support.microsoft.com/kb/214470/en-us, this one's so old, it refers to "Documents and Settings" as "%SystemRoot%\Profiles\" : | This one is more recent: http://support.microsoft.com/kb/236621, but essentially reads the same. The RAM drive would have to be at least big enough to contain say, 70 user profiles. I wonder if it would break folder redirection...
[/quote]

I kicked around the RAM drive idea, but I don't think using THAT much physical memory is worth it here and the requirements will keep growing. I think that best idea I have right now is to implement the 1u solid state disk iSCSI server. That way I can allocate say 8GB of solid state disk space to each of the 4 servers in the farm. I can also do this on the cheap with open source software that is tried and true here (CentOS + IET).

[quote=Sam Rodriguez]
As far as the 15K SAS mirrored drives? I'm not so impressed. My 2850's with 10K (3-disk Ultra 320) RAID5 with the Perc4/DC (128MB Cache) perform better IMO.
[/quote]

Well the mirror only counts as 1 spindle, where with your 3 drive setup you have 2 spindles for reading and your setup has write-back cache to negate the negative RAID5 short-write effect where there is no option on my setup for write-back cache :-(.

[quote=Sam Rodriguez]
I agree that network was never a problem (except WAN as it related to IE performance), but I suspect if I wereable to implement client redirection, I could open-up another can-o-worms (LAN performance. You know, certain subnets with daisy-chained $15 SOHO switches on 'em ;p)
[/quote]

We have client redirection here for "My Documents", "Application Data" and "Desktop" and it works well with no network overload. I have heard of people having problems with "Application Data" redirection, but I haven't seen it here. Maybe we have apps that don't use it that heavily.

[quote=Sam Rodriguez]
As we run the UPHClean service, and I could see that logoff activity caused as much disk i/o as logon, I thought I could try to lessen the amount of clean-up by tweaking the IE GPO I had in place. The changes I made were to keep only three days of page history, changed "Check for newer versions of stored pages" from "Automatically", to 'Every visit", and lastly I changed the "Temporary Internet Files Folder" size to like 3MB (Default 128MB), after all, we already have a network appliance caching some of that stuff right?
[/quote]

Well reducing the IE cache helped a little, but there is still the whole "Cookie" reindexing issue where users' Cookie folders may collect 10K+ cookies each which hurts when the index.dat is regenerated. It's these damn cookies that also cause the HUGE amount of disk io on log on and log off.

If I could off-load only 2 items from each users' profile it would be cookies and Internet Cache. After that I only need to worry about errant processes which Windows System Resource Manager does a decent job of.

[quote=Sam Rodriguez]
In an effort to better secure the environment, I shortened the Idle Timeout and Session Timeout settings on the listeners because a lot of folks were walking away and leaving their desktops (and client info) up for all the world to see. Well, this turned out to be a big mistake as I managed to increase logon/logoff activity! So, the smarter thing would have been to shorten the Windows Screen Saver (and use the Logon Screen Saver), but keep the Idle Session Timeout and Disconnection Timeout Setting reasonably long.
[/quote]

We don't have any session timeout set, set an idle timeout to something like 24 hours and use the screen saver feature to provide a security measure from unattended workstations.

[quote=Sam Rodriguez]
Another thing I did was get rid of services and protocols we didn't need, took a second look at all GPO's in the Citrix Servers OU for conflicts or redundancies, as well as revisit all the scripts to make sure they were streamlined (no unnecessary calls to Kix32.exe). I Programed more intelligence into my drive mapping user/group share mapping script because certain individuals belong to quite a few user groups and were getting excessive printers mapped (and slowing their logon experience). Sometimes there was overlap. So, I wrote a Select Case to grab the computername (to determine user location), and only mapped printers in that vicinity. None of these tweaks fixed the problem in in themselves, but the aggregate does help to an extent.
[/quote]

Yes, that will speed up logon times, but isn't a burden to the disks.

[quote=Sam Rodriguez]
In my judgement, until you run PerfMon on the pertinant hardware counters for at least a week (in datalog mode with reasonable intervals) you are simply taking educated guesses. When it's complete, bring it up in graph. Determine which counters were being hit and eliminate the rest. Make note of the time periods the biggest spikes occur. Go back and create a new PerfMon that looks at just those hardware counters and winlogon.exe, lsass.exe, iexplore.exe, and your published applications during the aforementioned time periods. The real challenge is when you find out that it's a legacy app that you can't get rid of, or simply "of microsoft design".
[/quote]

Yes I agree a long-term monitor of these counters is needed and actually I am thinking of using our server monitoring software to keep long-term statistics. Don't think I can gather the per-process for that time period as it would be too much data.

I can tell you now though the times of high usage, 1) at 9am Eastern during employee log in, 2) at 11am Eastern, when our Mountain time employees log in, 3) at 12pm Eastern when our eat-at-desk employees start personal web surfing, 4) at 2pm Eastern when our Mountain time eat-at-desk employees start personal surfing, 5) at 5pm when our Eastern time employees log out, 6) at 7pm Eastern when our Mountain time employees log out.

I can map it out, but I am sure that is what it will show.

-Ross
  • | Post Points: 35
Top 100 Contributor
Points 1,837
May 19, 2008 - Ross Walker wrote:
I can tell you now though the times of high usage, 1) at 9am Eastern during employee log in, 2) at 11am Eastern, when our Mountain time employees log in, 3) at 12pm Eastern when our eat-at-desk employees start personal web surfing, 4) at 2pm Eastern when our Mountain time eat-at-desk employees start personal surfing, 5) at 5pm when our Eastern time employees log out, 6) at 7pm Eastern when our Mountain time employees log out.

Gee Ross, sounds familiar ;p

For those that publish desktops with full functionality (Office pro, Adobe Reader & IE with multi-media plug-in's), I think performance issues will remain. Particularly with x86 machines. x64 hasn't scaled the way I though it would. Like Chris, I saw excessive paging even though installed memory was utilized at only 30%. So much for the MCSE answer to install memory when experiencing excessive paging(same resolution reply since NT4.0)! I'm often surprised at how little input there is on real-life pertinant threads like this one. Maybe that's why many of the consultant outfits keep things simple by not publishing desktops at all. Many organizations provide applications through the Web Interface exclusively. Nothing wrong with that, just doesn't present the end user with the richest most intuitive computing environment IMO. Anyway, off the soapbox I go.

BTW - That last paragraph in my 2nd to last post (RE: PerfMon/Process Explorer) was really suggested to the originator of this thread, Chris Norman, who I hope finds a suitable resolution - and posts it back here...

Samuel A. Rodriguez
Sr. Systems Administrator

  • | Post Points: 5
Not Ranked
Points 150
Ross,

For a start i'd make the following changes

Remove cookies from the users profile when they logoff, easy change with a GPO and will have a massive impact on I/O resources.
Delete Temporary Internet Files when a user close's IE.

How long are your logon's currently?

Rgds

Andy Friar
  • | Post Points: 35
Top 500 Contributor
Points 1,097
Over the weekend I did a good bit of work. I put in 8 G5 servers with the battery cache. Added the battery cache to 20 servers. Set all the array cards to 50% Read 50% write. I also broke the teaming on all the G5 servers and just went with one nic. Also on the G5 servers they have the "TCP Off Load" and "RSS" feature on the Broadcom nics which I disabled.

This is day 2 sense I made these changes. I'll post again on Friday to let you guys know how the week went. But so far I see a difference in i/o.
I still see the page/sec get really busy at times bouncing between 170 with quick spikes all the way to 2000. But the Disk sec/transfer stays at zero all day. I have people reporting back to me from the field. I'm trying to find someone experiencing the Type delay. Monday was a good day. People reported no delays. so I have high hopes.

BTW Adding the Battery cache to the array was the cheapest upgrade you can do for a server. I believe the cost was about $94.

I'll post on Friday and let you guys know how the week went.

Senior Administrator (Citrix)
USI Holdings

No matter where I am i'm never where I want to be.

  • | Post Points: 5
Top 500 Contributor
Points 1,097
BTW - That last paragraph in my 2nd to last post (RE: PerfMon/Process Explorer) was really suggested to the originator of this thread, Chris Norman, who I hope finds a suitable resolution - and posts it back here...


Thanks Sam, I keep all the Sysinternal tools on all the servers. I'd be lost without them.

Senior Administrator (Citrix)
USI Holdings

No matter where I am i'm never where I want to be.

  • | Post Points: 20
Not Ranked
Points 235

Thanks Andy,

I would love to mow down users' cookies on log off, kill em' all, let God sort em' out, but unfortunately users have become dependant on these to remember their site usernames and passwords, sad, sad, but I am exploring a way to purge all cookies not accessed in say 30 days or more, which should help.

Deleting IE cache files on close is not a good idea, it will just exasberate the high disk transfers/sec issue and cause more reading and writing. No cache needs to be small and preserved until the roaming profile gets purged off with delprofile. I have it set to 8MB now, but will probably bump it down to 1MB since we have caching proxy servers here.

Maybe if I set the cookie cache small enough it will self-maintain it's size by necessity...

Hmmmm...

  • | Post Points: 20
Top 100 Contributor
Points 1,837
No worries Chris. What prompted you to do the BBWC upgrade? Did you say that you were running the P400i Controller (HP shop)? That's ships with 256MB cache and the BBWC adds an addl 128 (write) yes? That controller supports 300MB/s perconnector.

On the Dell PE2950's, I ordered the PERC 5/i SAS controller (256MB cache) which supports 1200MB/s. I set the 256MB DDR2 R/W cache to 50/50 in the BIOS as well. Both are PCI x8.

I'd be relatively surprised if the controller turns out to be the bottleneck. Still, doesn't make sense that the servers would page so much when there was plenty of system RAM available.

Rather than get stuck on that, I should have looked closer at the controller - especially since disk was getting hit hard. I think we know why, and we've covered that in the last few posts. I'm really interested in finding a resolution to that, though improving disk performance may prove to be a more realistic approach. I'm going to validate first.

Found a couple of good links on the process.
http://www.computerperformance.co.uk/HealthCheck/Disk_Health.htm

Since PerfMon.exe and objects/counters hasn't changed, here are technet articles for (NT4.0)

http://www.microsoft.com/technet/archive/ntwrkstn/reskit/procsr.mspx?mfr=true

http://www.microsoft.com/technet/archive/ntwrkstn/reskit/procsr.mspx?mfr=true

Best regards,


Samuel A. Rodriguez
Sr. Systems Administrator

  • | Post Points: 35
Top 500 Contributor
Points 1,097
I have a hodgepodge of servers. My intensions were to have one model server across the board but that rarely works out. We buy up offices and I inherit what I can use. Luckily (so far) it's all been HP. The majority of the controllers are 6i on the G4s and P440i on the G5s.

Early in this thread Rick mentioned something about the write cach setting. So I started poking around. Come to find out you I couldn't even adjust it without the battery. so it was set at 100% read with no way to tweak it. I read a few more threads on it and decided to try that out. It was relatively cheap to do and I had nothing to lose.

I popped the batteries in and let them charge for a while then set the read write to 50%/50%

Things are great again today but I'm not ready to say this was the fix. We have so many thing being worked on at the moment it's hard to tell. Hopefully by friday i'll know for sure.

I've been burned to may times by thinking I have the issue licked when it wasn't.

Senior Administrator (Citrix)
USI Holdings

No matter where I am i'm never where I want to be.

  • | Post Points: 5
Not Ranked
Points 150
I would love to mow down users' cookies on log off, kill em' all, let God sort em' out, but unfortunately users have become dependant on these to remember their site usernames and passwords, sad, sad, but I am exploring a way to purge all cookies not accessed in say 30 days or more, which should help.


I suppose it depends on the type of business your in as to the need to store cookies, whilst cookies are a nicety i'd hardly call them a necessity.

I do agree that a better solution would be to somehow be selective over which cookies are stored, logistically it would be a nightmare but a blanket 30 day or 1k cookies could work.

Alternatively just redirect the cookies to the users home drive but you wouldn't really be fixing the problem just moving it to the fileserver.

Rgds

Andy Friar
  • | Post Points: 20
Page 3 of 5 (74 items) < Previous 1 2 3 4 5 Next > | RSS