Brian Madden Logo
Your independent source for application and desktop virtualization.
Marketplace

advertisement

High CPU use, CPU Spikes and CPU Freezes - VM's running Citrix PS4 with FR4., in the Virtualization + Server-Based Computing forum on BrianMadden.com

rated by 0 users
This post has 14 Replies | 0 Followers

Not Ranked
Points 120
Mark Burse Posted: 06-06-2008 7:29 AM
Hello,
I will summarise below:-

Yes only on the Citrix VM's.

First let me supply a few extra details:-

1) We are trying to run 20 x Citrix VM's, on 8 x ESX Hosts (4 x dual core - 8 x CPU's per ESX host - 32GB of Mem). These same ESX Host servers run 150 x other VM's (170 x total VM's).

2) We already use the Citrix build process as per web site (http://virtrix.blogspot.com/2007/03/vmware-best-practices-for-deploying.html)and follow all instructions expect the separate Vdisk for the "Windows Pagefile".
Are you able to advise what benefit placing the Windows "Pagefile" on a separate Vdisk will achieve? (all VM's use SAN LUNS from a HP EVA8100)

3) The issues we are seeing on the Citrix VM's are not seen on any other Windows VM. - we see 3 issues:-
a) - The VCPU spiking at 100% and ICA users sessions hang completely - users then have to end the ICA session and start again.
b) - We have had to Power Off hung Citrix VM severs - The Windows OS is completely dead. We believe this is because of the CPU, but are not able in the VM world to obtain a Windows dump report so that it can be analysed by Microsoft and or Citrix - any thoughts?
c) - we have set-up MS "Perfmon" and see an "Average CPU Queue Length" of "10" (that's 5 per CPU). MS guidelines are a maximum of "2".

4) ESX Host Server - Memory, Networking and Disk I/O seem to be doing next to nothing.....The average CPU usage is low on all the ESX servers @ approximately 30% as a weekly average.

5) We have set a Citrix user policy of no ore than 25 x concurrent users per Citrix server (On physical Citrix Servers the policy is set at 45 users) - We have hundreds of users that use MS Outlook via a Citrix shared Desktop.

6) On each Citrix VM we have installed a 3rd party application called "Appsence" - which seemed to be almost mandatory as without this performance was poor to say the least - Appsence is £1500 per VM - it is proving very costly to scale the ESX farm.

7) We have tested a Windows 2003 VM has the VMWare "Memory Balloon driver" enabled - has 2GB of RAM (using 36%) and 2 x vCPU's - it has only 12 x ICA users connected. all running MS Outlook and Word.
Performance still bad.

Any case studies or customers that we could contact run Citrix successfully will be greatly appreciated!

Rgds

Mark Burse
Top 200 Contributor
Points 705
Are these Citrix servers being P2V or fresh installed? Sometimes P2V causes performance issues. Have you check the network configurations and settings for both virtual machines and physical side. Your storage is heavy as far as RAID designs and disks I/O? SQL database is fully fine tune with best practices? How much resources did you assign to your Citrix VM for RAM & vCPU?
Stefan Nguyen
VCP, CCEA, CCA, CCNA, MCP, A+
iGeek Systems LLC.
Citrix/VMware/Sharepoint/ Consultant
The Power of Knowledge
stefan@igeeksystems.com
  • | Post Points: 5
Top 200 Contributor
Points 705
FYI: Kaiser Permanente is the top 5 Citrix users in the world with approximately about 10,000 servers worldwide and have about 250+ applications and they have running large Citrix farm as well. I'm pretty sure there are larger organizations using Citrix/VMware out there without any issues, it just need to check out more details on performance and health check procedures and implemented it from there.
Stefan Nguyen
VCP, CCEA, CCA, CCNA, MCP, A+
iGeek Systems LLC.
Citrix/VMware/Sharepoint/ Consultant
The Power of Knowledge
stefan@igeeksystems.com
  • | Post Points: 5
Top 200 Contributor
Points 705
As you can see on this forum Mark's respond:

Migrated physical MPS 4.5 servers to VMWare. Original hardware were HP DL380G4 dual core 3.2 Intel P4 with 4Gb RAM and Ultra320 SCSI. NIC cards were teamed for 2Gb throughput.

Since the migration, users can not tell the difference that any migration tool place. Since we are now all virtual, we were able to add another MPS server to help with the load but performance monitors show that it wasn't really needed. We added the server more for HA for Citrix than perforance.


Pretty vanilla in the applications used via Citrix....MS Office 2003 and a few SQL database based programs where the databases resides elsewhere on a SQL server. Nothing really intense.

VMWare environment: ESX 3.5 with HA VMotion. Dell 2950 Quadcore servers connected to Dell/Equalogic iSCSI SAS SANS. RAID50 on the SANS.

As you can see, we had the right combination of hardware and VMWare infrastrucutre to make this work out nicely for us.
Stefan Nguyen
VCP, CCEA, CCA, CCNA, MCP, A+
iGeek Systems LLC.
Citrix/VMware/Sharepoint/ Consultant
The Power of Knowledge
stefan@igeeksystems.com
  • | Post Points: 5
Not Ranked
Points 120
Stefan,

All Citrix VM's are Clean builds, from a std template.
Storage is SAN based and is complety doing nothing......
2GB Ram, 1 and 2 vCPU's - HP BL685 servers - 4 x Dual Core AMD - 3GhZ...
  • | Post Points: 20
Not Ranked
Points 35
Hi Mark,

I need to tell you that you are not alone.

We have a very similar setup with HP 385's with 2x dual core AMD's with 16gig of memory connected to an HP EVA 8000 and getting the same results.

Our original spec was 4 virtual machine's / physical with 20 users per host.

We followed best practices splitting drives, reg key, vmware tweaks etc and at best we were able to get were 5 users per virtual citrix server.

We have a case going with hp, vmware, citrix going on for over 4 months and no one has been able to pin point the problem.

Its frustrating that everyone tells us it can be done but yet with all this high end gear we cannot make it happen.

if you are ok with it we should get in contact with each other and compare notes and see if we can figure out what’s going on.

As a side note we had to press on with the project so we scrapped VMware for right now and we went with bare metal. After Extensive testing we were able to get more then 100 users per bare metal box using the following configuration.

Hp Dl 385
2x dual core AMD 2.6 opty's
16 gig of memory
Mirrored C: 15k Os/Apps
Mirrored d: 15k Swap/apps
Windows 2003 Enterprise
Running mostly office applications and Mainframe Terminal Emulation software.

Jon


  • | Post Points: 20
Not Ranked
Points 120
Hi Jon,

Yes please email me - mark.burse@ocsl.co.uk

Rgds

Mark
  • | Post Points: 5
Not Ranked
Points 80
I am working on a project where I will be building a new Citrix farm on VMware, I only have around 200 users, and I have 10 HPDL380 G5's with 2 x Quad core cpus and 16Gb ram.

I also have around 12-15 servers which need to be P2V'd so I am little concerned regarding the performance issues you mention.

I have built a Citrix farm as POC and we had a small amount of users which reported perfomnace usage as very good, but I am concerned when you mention that you are only getting 5 users per Citrix VM.

I am keen to hear any pitfalls and if you want my personel details let me know as I work for a large PLC who have good access to VMware and Citrix technical experts.

Please keep me posted!

Thanks,

Kevin
  • | Post Points: 20
Top 500 Contributor
Points 420

Mark,
If you're still having this issue, I'd like to make a recommendation or two.
Regarding:
3) The issues we are seeing on the Citrix VM's are not seen on any other Windows VM. - we see 3 issues:-
a) - The VCPU spiking at 100% and ICA users sessions hang completely - users then have to end the ICA session and start again.
b) - We have had to Power Off hung Citrix VM severs - The Windows OS is completely dead. We believe this is because of the CPU, but are not able in the VM world to obtain a Windows dump report so that it can be analysed by Microsoft and or Citrix - any thoughts?

--------------------------------------------------------------------

A) If possible, it would of been interesting to get an instance of kernrate in there, or even process monitor so we could see what app/driver is causing the high cpu usage.
B) Ther is a way to generate a dump from a session, theres a utility out there that can do this. I believe its on Citrix's site. The dump could show runaways.

Hopefully you get this resolved, and when you do, post back your findings for the rest of us!

Thanks,
--Mike
  • | Post Points: 5
Not Ranked
Points 150
Running CPS4.5 FR4 on vmware farm all client connections using Pnagent. Rollup hotfix 2 not installed.

VMware specs

Dell Poweredge 2900
2 x Quad Core 2.66 GHz
24GB RAM
ESX 3.5 update 1

I am experiencing sessions that go to a down status. I can reset the down session, and while it disappears momentarily it comes back. This pegs the CPU to %100 and prevents other sessions to log on to the server that is experiencing the issue. Performance wise, the entire farm is affected as well, but not to the extent of the individual server.

I have found disabling logons and basically waiting it out until the session finally resets to be the fix at this time. I can reboot the server hard, but it affects all other users that are processing normally and of course undesirable.

In the event viewer there were lots of tamper protection errors from Norton. I've disabled tamper protection. Other than that the event viewer did not have other errors.

Somedays I hate Citrix
20 CPS 4.5 on VMware 3x
  • | Post Points: 5
Not Ranked
Points 150
As I sat posting this I was experiencing the issue.

The cpu spike started at 7am and lasted until 10:30am. From 9:30am until it recovered I had logons disabled and had reset the down session. It eventually recovered and I did not affect a reboot.

Nothing in the logs.

Another condition is the session count drops way low, which is my indicator that something is not right.

Somedays I hate Citrix
20 CPS 4.5 on VMware 3x
  • | Post Points: 5
Top 10 Contributor
Points 15,109
Hi,

Check that you are using the correct HAL in the VM. If you use a multi-processor HAL with a single vCPU VM, you'll have issues with performance for sure. For a single vCPU VM, the HAL should be Uniprocessor (unless you have hyperthreading enabled within ESX).

Also, use perfmon or Process Explorer to look at context switches. Process Explorer has a delta metric (view -> System Information for all processes) which lets you see real-time context switching - pretty handy. If you have high context switching (>12000 for a single vCPU VM), then you need to investigate that further (again, Process Explorer will help). I disable any unneeded virtual hardware in the BIOS of the VM (floppy controller, secondary IDE controller, COM ports, etc) which cuts down alot on context switches caused by hardware interrupts.

Also, if you P2V'ed any VMs from VMware Server to ESX then you may have debugging enabled on the VM within ESX. That can have a huge penalty on performance.

Make sure you review this webpage too:

http://virtrix.blogspot.com/2007/03/vmware-best-practices-for-deploying.html

Cheers,

Alan Osborne
President (MCSE, CCNA, VCP, CCA)
VCIT Consulting - Citrix/Terminal Services Remote Desktop Solutions for SMB
p: 604-288-7325
c: 778-836-8025
web: http://www.vcit.ca
blog: http://www.vcit.ca/wordpress

  • | Post Points: 20
Top 150 Contributor
Points 1,075
We can get about 20 users on a VM (whether via Published App or Published Desktop). More than double or triple that on a physical. We are committed to VMWare but it is not the perfect solution on the production side of the house. Development, testing, etc., has bigger ROI. From what I've read 20 users is the optimum for most production enviroments.
  • | Post Points: 5
Not Ranked
Points 5
Hey Mark,
Have you gotten any further on this issue? We are in the same situation and experiencing the same issues. I have run extensive CPU tests and testing in general but haven't been able to figure it out.

Our infrastructure is pretty much the same as what you have listed though we are running on Unisys rather then HP Hardware.

Any info would be appreciated. I have followed all the white pages and