Citrix scalability problems at Kaiser Permanente? What's really going on?

Linda Rosencrance wrote an article in Computerworld that highlighted some trouble that Kaiser Permanente is having with the Epic patient management system they're deploying via Citrix Presentation Server.

Linda Rosencrance wrote an article in Computerworld that highlighted some trouble that Kaiser Permanente is having with the Epic patient management system they're deploying via Citrix Presentation Server.

The aritlce came out on Monday, but I've personally received three separate emails from people in the community about this article all saying basically the same thing. "Do you know anything?" and "I don't want my boss to find out that Citrix isn't scalable."

Therefore I'd like to address the Computerworld article that's causing all this, as well as my thoughts about it.

First, let's look at the Computerworld article. It's four pages, but it talks about all sorts of aspects of the project. (i.e. It's about more than just Citrix.) Here are the Citrix-related bits if you don't want to read the whole article:

  • Kaiser is implementing a multi-billion dollar patient management system based on Epic.
  • This implementation is for more than 100,000 users.
  • Citrix Presentation Server is being used to provide access to all users--both internal and external.
  • This is Citrix's largest single-customer deployment of Presentation Server.
  • Some people at Kaiser think that Citrix is not the right solution for this project.

The article quotes some Kaiser employees saying some pretty damning things about Citrix. However, the main source of quotes is from Justen Deal. However, Deal is not an IT employee--he's a project supervisor in Kaiser's Health Education department. This is weird because of Deal's quotes are about IT architectural issues, such as:

We're using it [Citrix] in a way that's quite different from the way most organizations are using it. A lot of users use it to allow remote users to connect to the network. But we actually use it from inside the network. For every user who connects to HealthConnect, they connect via Citrix, and we're running into monumental problems in scaling the Citrix servers.

Using Citrix is something that defies common sense. It would be like trying to use a dial up modem for thousands of users. It's just not going to work, and it's not something anyone would tell you a dial-up modem should work for.


Back in 2000, I worked with a different hospital chain that decided to standardize with Citrix to deliver Epic to all of their hospitals. In those days, we used the term "internal application service provider" to describe the role that IT was taking in providing access to applications via Citrix. Since then, I've personally worked with dozens, and I know for a fact there are thousands of companies using Citrix Presentation Server to provide access to internal applications on the LAN in addition to providing access externally. After all, whether internal or external, standardizing application delivery on Presentation Server gives you easier management of applications, avoids desktop management and incompatilibities, allows for shadowing, facilitates roaming between multiple client devices, and countless other benefits over the "old" way of installing three-tier client apps on thousands of desktops.

So what's the deal with Kaiser's Citrix implementation? The article quoted and Scott Herren, VP of the group at Citrix that includes Presentation Server, saying the issue at Kaiser was not scalability but rather that Kaiser did not architect their environment properly. He also said that Citrix, as well as other IT vendors, are working with Kaiser to get the problems corrected. Finally, he pointed out that Citrix Presentation Server powers many huge deployments of Epic.

The bottom line is that this sounds like a pretty standard "nightmare" IT project. When the article quotes people saying that Citrix isn't scalable, you have to take into consideration the source of the quote, as well as the fact that they're talking about one single environment with more than 100,000 users. This shouldn't be construed more broadly that Citrix has scalability problems. If done right, you have nothing to worry about.

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Thanx for clearifying and bringing your view to this, these articles seem to start ripple effects to make people nervous, even when nothing is wrong.
Always nice with an 'objective/different' view on things :-)
I'd love to be the saleperson who made the hardware sales for that project. I'd be ordering my Ferrari or something to that effect.
I think any experienced Citrix tech has been here before, although probably not on this scale.
I've had a project hit the media due to "Citrix problems"...they very rarely are a problem with the product, but usually with the project comms or poor design / architecture.  In my case I had board-level minutes going back 9 months prior to the problem...basically "I told you so..."
Yes it causes ripples, and can be very frustrating and occasionally damaging.  If you want something to blame KP, I would be looking at the project team (inc. IT), not a third-party software product.
to learn more about how the environment is configured.  
I find the comment amusing in the article by Deal where he says
"I don't think that Citrix really appreciates what we're trying to do with their software," Deal said. "I don't think ... we have any from them because they would not guarantee that this was going to work at all for this implementation."
Citrix doesn't appreciate what they are trying to do with thier software?  no SLA's ?  
Trough reading the article, it sounds like the folks at Citrix recognized that
1) the app might not work well in such a huge environment and
2) later Citrix questioned how the whole system was architected
I'm sure there were politics involved - but for something this huge,  wouldn't you use Citrix to actually do the architecture/implementation for a project of this size?   Anything else would be foolish. 
This fellow Deal is the dangerous sort in that he seems to know nothing at all about Citrix, and yet he is eager to be quoted. 
The final quote in the article is telling.  "I know in conversations I've had with my superiors there was a big push back in selecting Epic, and it was not a choice made by IT simply because of the large infrastructure needed to support it."
I think it is important to always remember that IT does not run the show.  Once the call is made, you strap on the boots and you make it work.  If people in IT are frustrated, then perhaps they need to get off the soft cushion they have fashioned for themselves within Kaiser, and go elsewhere to earn a living. 
Seems to me Deal, the anonymous employee and any other individual frustrated by the opportunity of implementing the largest Citrix SBC solution in the world should not let the door hit them in the *ss. 
You sign on to support the customer (managers are customers), and the binding agreement is a commitment to success.  Deal has broken the deal, and if it were my call he would be out.  It doesn't take genius to recognize there is a problem, genius is recognized by solving the problem, and Deal is no problem solver that is clear.
Where have all the go-getters gone?  Good luck Kaiser! 
The dude is into Dolly Parton and Martha Steward, so the Citrix thingy is probably the least of his problems
Sounds to me like the same problem we had when we implemented Citrix in our environment. The medical staff suddenly noticed they could be meassured on their computer usage and couldn't just say we need more computers to do our work. That made alot of noise the first few months, then calmed down.
Obviously a few ignorant employees... 

I noticed the article keeps stating power outages.  How is this a Citrix scaling issue?  More like an architecture issue.  No DR site?

My guess, is politics dictates how things are done and that is the root cause of the KP's problems.  Of course, the article never once stated what the root cause was to the Citrix outages.


Yeah, the article never does say that each of the outages was actually caused by Citrix (or Epic, for that matter). For all we know they could've lost a WAN link to a remote office, and then that's now listed here as an outage that prevented people from getting their medicine or whatever.

I did think it was interesting that they listed half a dozen outages here and there. Now I know it's dangerous to draw conclusions based on this short article, and I haven't seen the full 722 page report or whatever it is that they have, but I think that a outages of a few remote sites here and there aren't that big of a deal to a company with 100,000+ users scattered throughout the country.

I mean outages are no fun. But think about it.. Probably any company will look bad if you take all of their departmental or single app outages over a year or whatever and compress them down into a single bullet list.

I'm sure everyone can relate to the fact that in just about ANY organization, there will be a few users who are the most vocal.  When something goes even slightly wrong for them, they are the firsts one to blame the technology (usually Citrix ), especially if it is new to their environment.  This Deal fellow almost sounds like he was a Citrix nay-sayer from the get go.  He certainly comes off sounding like he doesn't have a clue about it.
After reading the article I was left wondering - is Computer World the gossip magazine of the Computer Industry?
There is something very National Enquirer about an article written based on information from someone who lacked
common sense so much that he wrote an email making allegations and sent it to his entire 100,000 user organization.
Not a bright career move!
I am glad that someone like Brian wrote a response on it. It is hard when you work with highly educated people as
we do here in the medical field. When they read information from what appears to be a credible source it is often
hard to sway them from accepting statements made in the article as fact even if they don’t understand the concepts
behind it. You have to offer an equal or more credible source that points out the flaws of the article they read in
order to back your position.
 Thank you Brain for saving me from endless debates and discussions about statements that are just plain not true.
Being an Epic customer myself (running via Citrix) I can't image Kaiser moving away from Citrix back to the good ole days of local installs.  If they think they have problems now!!  After reveiwing the ComputerWorld article I wonder how much of that stuff is Citrix related.
Did anyone else notice how, according to Computerworld, there was "... a power outage on May 9 at Kaiser's data center in Corona, Calif., lasted for 55 hours and 7 minutes".  Then two pages later we have "On May 10, a power outage that lasted for 37 hours and 9 minutes".  I'm fairly sure I would have remembered seeing 92 hrs downtime in two days, but I'm drawing a complete blank there.  It kind of makes me question the rest of the report. 
Elsewhere Mr Deal would have you belive that Kaiser are running only 12 concurrent sessions per server; at my count that means over 3600 servers for the Epic system alone.  When you add in the pre-prod and training environments and the servers for all the other apps that KP delivers via Presentation Server one thing really stands out - I'm going to have to ask for a pay rise.   The truth is far less spectacular than some people would like to belive.  KP has a large Presentation Server environment, I don't belive we are Citrix's largest customer, top 5 maybe, top 10 almost certainly, but I've never heard anyone at Citrix say that Kaiser are their biggest customer.
Is Presentation Server scalable?  We run 40,000 concurrent users today and are adding more every week, I'd say that was pretty scalable.  Can you scale a single farm that big? I don't know, I've never tried and I'm not about to. 
Simon Bramfitt
Nice to see that you have replied to this article Simon...

As someone who has been in every fire in the past 20 months, I have to add my 2 cents.

We are ~ 40000 and growing strong.  I am not saying that we have not had our growing pains (including our old 1000+ server farm).  But we have been able to work hard and creatively to keep availability up on our farms.  Many of our problems have not been Citrix related.  They span from network issues to Antivirus issues.

With our creative MFCOM and WMI scripting, we are able to solve the daily server issues with little to no impact to our doctors.  For the more complex infrastructure issues, we have worked along with Citrix using CDF tracing and Netmon captures to find anomalies in our environments.

I am grateful and honored to be part of this ambitious project.

It’s been an amazing journey so far.  I can't wait to tackle our next issue tomorrow :)

Allen Au.

This is the common mistake made by many organisation... "this is just another set of Windows server". Most of my customer who thinked like that went into some kind of project management problem.
As soon as you are thinking of it like a "windows mainframe" and place proper process and qualification, you do not getinto this type of troubles.

Just to aslo add that "this is the largest single customer implementation" is quiet false... we have 140 000 seats running  in a single customer... what is true is : "this is a very huge implementation, one we do not see every day ;-)"

This is what in my opinion to often happens at company's. Some rookie manager thinks he/she is a genius by aquireing a package and place it on citrix. Then they figure out it ain't working and blaim everybody accept themselves. Seen it happen more then once were a package is bought and the IT department has to fix it so it works on citrix.
Citrix is scalable as far as I know... why else would we use it? that the package they use isn't scalable that is really there problem and not citrix's.
Microsoft, Citrix amd Epic are companies dedicated in making their technology work. And work well.
You will not find this level of commitment, dedication and time in the industry from other vendors with like products.
Get real Mr. Deal.
You sound like a Desktop tech who wants all applications to be installed and used from the PC Desktops. But is that scalable. You should do the math before.
I think it's great that you responded this way Brian, and that you've got plenty of us "Citrix Pro's" supporting you.
Projects will always fail if the key stakeholders are not on-side. I couldn't quite work out if Mr Deal is a key stakeholder or not, but once he's off-side or does not assist with the risk management, then the project was never going to be successful unless he was removed from this role. One would have thought that Citrix would have escalated this to more senior stakeholders. Clearly, from the article, Mr Deal actually knows little about how Citrix is used in the real world.
But being such a large project, I would have thought that Citrix consultants would have been involved from the beginning to ensure that the environment was architected correctly. Perhaps this did happen, but we are really only seeing a small part of the story. I just hate seeing any bad press about Citrix. Implementations fail because of bad planning, and techs/consultants/architects that are not the real deal.
If they have many Data Centers running 24/7, then how did a 55 hour power outage at one Data Center cause such issues? It also sounded like they had some WAN issues with staff a certain facilities not being able to connect.
We've all been here before:
User attempts to launch application via Citrix and is unsuccessful; therefore it must be a Citrix issue!
99% of the time it’s not a Citrix issue (aka power, network, and/or backend application outage, etc).
All I can say is consider the source…
Thanks for posting this Brian :)  Looking forward to BriForum 2007!
I have been a successful citrix guy sofar. In my opinion , any technology has some limitation.
But for Citrix \ Remote access , if some people say scalability is an issue, that is really funny.
I remember reading an article about why big  projects fail, here is one of them
I like the question 5 should be read as Citrix Admins!!!
I read the Computerworld article as well. I couldn't believe what I was reading. I have worked with Citrix environments for the last 10 years. While none of them are this large, it still has nothing to do with scalability. I has to do with poor architecture. The article doesn't say what Dean's role was just that he was a Project Manager. It does sound like it might have been his project. Hmmm maybe that is why it isn't working right. In every Citrix environment I have managed , Citrix gets 99 percent of the blame, When it is really only a Citrix issue 1 percent of the time. Yes I have applications that do not behave well. I just have to scale and configure those servers differently. Dean is reported to be 25 years old. He has been working there for 2 years. That puts him at 23. I have known a few 16 year old geniuses in this business and I can assure you after reading his comments and his blogs, he is not one of them. It really was poor judgement putting someone of his caliber in a management position. It was also poor judgement reporting on him without doing the background research. What are his credentials? Where did he learn about Epic? Citrix? Best of all When? He is only 25 years old and also has poor judgement. Whew I could go on forever
Yeah you gotta watch out for those kids...
As an architect, I've put MetaFrame (or whatever we're calling it these days) into use for clients big and small.  Never 100,000 big, but certainly big.  So I do consider myself a fan of Citrix, when it's being used in the right situations.

But, my, oh my, I've never seen you folks so disjointed before.  Silently lurking has been my thing, because quite frankly, I've been fairly selfish in using the site as a resource, and have never seen fit to add my opinion before.  Typically, in fact, opinions aren't necessary on here, it's the facts that count...

Which is why this particular story is such a black eye for all of you.

First of all, as best as I can tell, Deal is, in fact, an IT employee.  It sounds like Kaiser's PR folks realized the best way to neutralize him was to say otherwise.  It looks like we fell for that hook, line, and sinker.

Well, everyone except for Scott Chiara, who says the kid is a desktop tech who favors hand installs.  According to his blog, he actually seems to prefer hardware thin clients:

And, then, there's M. Wilson Fox, who seems to think the mainstream media and Deal both have poor judgment.  Now Fox, based on speculation, thinks that Deal must be running the Citrix project at Kaiser Permanente.  Now, based on "comments" from Deal's blog and, I assume, to the media, Fox is convinced that Deal is not a "16 year old genius".  As best as I can tell, he's actually 25, and I don't know of anyone that's called him a genius.

Guys, I'm all for holding people accountable for what they say.  But what has Deal said?  KP is getting 12 users per Citrix server.  He didn't attack Citrix itself, as far as I can tell, just KP's Citrix architecture, and, apparently, the fact that Citrix sold it to KP in the first place.

According to what I've read in most places, KP has 13,000 average Citrix users, with a shift-change peak of about 26,000.  Yet, KP has 2,000 Citrix servers, which, according to KP and Citrix's own admission, still isn't enough for their CURRENT demand.  At 13,000 users, that's an amazing 6.5 users per server, while at 26,000 users, that's an impressive 13 users per server.  (Apparently they've been able to trace a good deal of the problems to their XML Broker setup, which makes sense given its...sensitivity.)

It seems pretty clear that Kaiser Permanente really has screwed something up here.  And, instead of sending your resumes over to KP headquarters, you guys are shooting the messenger (with some really cheap shots and low blows, i.e. Martha Stewart and Dolly Parton).

Now, I'm all about critical analysis.  But this time around, you boys bypassed analysis and went straight for the jugular.  Jaded and cynical?  Or threatened by a 25 year-old?  Maybe both, and maybe I'm not much better.  But you've all done yourself, the Citrix community, this kid, and KP a disservice by tossing a good opportunity to take a good look at a mega Citrix install that just isn't going well (actually, the mega-est install).

Shame on each of you, and on me for not saying something sooner.

Nicely put Jim, you have many good points.
One of the Kaiser Citrix Architects was at iForum this year and was a speaker in one of the sessions. He did talk about the XML broker and how they approached load balanacing it.  From what I gathered it was custom design, developed there at Kaiser.  I'm sure there's more to it, but thats all I can remembered.
And I too have never architected a Citrix EMR environment that large and cannot say how well it will work. But I would like to know more about the Kaiser deployment and how it is architected, someone should write a detailed technical paper. So we can all learn about it.
As far as Mr. Deal goes, the only thing I was disappointed in was that he did not provide detailed information about what was going on there. I felt like he was speculating on many of his points.
Hey Jim
Maybe a few facts might help clear the air a little.
Kaiser does not have 2000 Citrix servers, we actually have about 1450, of which, today, 1044 servers are deployed in the production farms.  The remainder are assigned to engineering, development, P&S testing, training and general pre-production activities.   On Tuesday this week we peaked at 40,210 production users.  A significant number of the production servers are assigned to infrastructure roles (dedicated ZDCs, backup ZDCs, Farm Metric Servers, XML brokers etc.) which reduced the number of server's actually hosting production applications to less than 1000.   Beyond that, we support users from GA throught to HI.  This means that when a server in CA is a peak load, a GA server is running at about 80% of peak load and a HI server is only running at about 5% of peak load. 
The net is that peak utilization per server averages out at about 65 sessions, with individual applications ranging above and below that figure.  We can in fact run hotter than than, but we try to run at less than the maximum possible to provide some head room to cover us in the event of possible server failure.
FYI - the 2000 server figure was from our HR dept. and I've no idea where they got that from, maybe they got confused with Windows 2000.  And if anyone does want to send in their resume I'd be delighted to recieve it.  
I just got done reading the Computerworld article.  What it sounds like to me is that the IT engineering/Ops people in KP got fed up trying to make Epic work in such a large deployment and started complaining to their PM Mr. Deal.  I understand how frustrating it can be to KNOW that deploying any decently designed application get's you 40-70 users per server.  To be limited to 12/server really gets my goat because I've got the same exact thing going on with an application that our business runs.  Which by the way was forced on us by our IT executive management group and ended up being one of the last fatal mistakes of a CIO now relegated to the "special projects" cube.  The flak that Citrix is catching seems to be an unfortunate byproduct of Epic's inability to scale not Citrix's.  My unfortunately crappy app only allows me to get 6-8 people per server.  Oh, all the vendors promised us that just wasn't the case and they'd save the day.  VMware, RamSan, IBM, AppSense they all left our building scratching their heads or crying.  Thank God we hardly ever have more than 3-4 people at a time in it trying to do something per server.  If I had a 100k+ deployment of this particular application I think I'd kill myself.
Several months later I drop my two cents, for the edification of those who will come later.
What Simon didn't mention is the intensive level of change that takes place in the environment.  During 2005/2006 (the time frame referenced in the article) the number of changes to the various regional implementations of Epic were nothing short of breathtaking.  In an environment where you have 1000+ production servers and you are introducing weekly changes on even a small number of them (several hundred at a time) it is impossible to maintain any level of real stability.  That level of change was mandated by the business unit in charge of Epic and did not adhere to the recommendations that Simon and his team (or Citrix) had been making for some time.  Did you know that the IM database (at the time) could fill up?  Neither did Citrix until KP did just that and was forced to perform manual weekly updates on the 1000+ servers while Citrix scrabled to fix it.  There were power problems not directly related to the terminal server environment (there were 14,000 servers total) that resulted from poorly planned growth (switched from 8U single servers to mostly blades, didn't upgrade the A/C).  Those issues persisted well into 2006 (and may still). 
With that said, there were initially some growth related challenges with the Presentation Server environment that were addressed via design changes that had to be filtered through no less than 5 different sets of "experts" including CCS.  The XML challenges that several people have mentioned were preceded by 4 different variations of the "black hole issue" that was prevalent at the time.  This was all compounded by the fact that the upper management at the time motivated by way of firing and restructuring (there were 3 in less than a year during that time frame) so managers were under tremendous pressure to get things done and the decision making suffered for it.
As I mentioned to the citrix team at my current solutions provider when this first came out, Kaiser is the perfect example of just how scalable Citrix really is.  Simon can confirm these numbers, but as I recall KP grew from 200 terminal servers to 1400 in just over a year and a half.  The fact that those kind of growth numbers were achieved to begin with is the perfect sales pitch for scalability.
And lets not forget that KP drove at least 20% of the product development for PS4 and the next two or three releases.  Citrix may have left KP hanging once or twice (or three times) but they have, over time, addressed the issues that came up and the resulting product is the better for it.  Citrix is not perfect, but it is scalable.  Epic on the other hand...
He worked in the Education and Training department at KP, and held a supervisor job and primarily dealt with creating flyers, brochures, etc., and any other education and training related material.  I am not trying to down-play his position or role in KP, I am just trying to clarify that he never worked in KP-IT, or acted as an actual IT Project Manager (or any other IT department for any other organization that I am aware of).  I just stumbled onto this article earlier this evening.