How should Citrix integrate VoIP with Presentation Server?

For years people have been talking about the convergence of the mobile office. Specifically, they've been talking about how VoIP software will allow telephone service to be provided just like any standard IT application.

For years people have been talking about the convergence of the mobile office. Specifically, they've been talking about how VoIP software will allow telephone service to be provided just like any standard IT application. And whenever there's talk about providing access to an application, there's talk about Citrix.

The question of how Citrix Presentation Server will support and/or integrate with VoIP software is as old as VoIP itself. That said, now that Presentation Server is very mature and supports features like bidirectional audio, this question is once again in the spotlight.

To that end, Citrix Presentation Server product manager Orestes Melgarejo has written a very important blog entry called "To VoIP, or not to VoIP." In it, Orestes talks about the Presentation Server VoIP question and presents three different ways that Citrix could integrate these two technologies.

As you know, I'm not typically one to simply link to other peoples' blogs, but in this case, Orestes' asked for the community's thoughts two weeks ago, and so far there haven't been any responses.

Therefore in this article, I'm summarizing Orestes' thoughts and asking for your feedback on both his and the community's behalf. And if you want more details on this topic, I encourage you to read Orestes' original blog entry.

Citrix Presentation Server's VoIP Integration Options

When we talk about VoIP delivered as an application, we're typically talking about software programs that use a computer's local speakers and mic (or a headset or handset) as a telephone. This software is colloquially known as "softphone" software.

So in this case, the heart of the issue is that you have softphone software, and you have Presentation Server. Now what?

There are three options:

  • Install the softphone software on your Presentation Server. Users access it through ICA along with the client mic and client audio virtual channels.
  • Install the softphone software natively on the client device. The softphone communicates with the VoIP backend directly, bypassing Citrix.
  • Use a hybrid approach. Install the VoIP software on the client directly, but configure it so that its native SIP network traffic is wrapped in ICA and sent through the Presentation Server.

Let's look at each of these a bit more in-depth, as well as the pros and cons of each method.

Install the VoIP softphone software on the Presentation Server, access it remotely via ICA

This method is something that a lot of people have tried. In fact, this is one of the primary examples that Citrix touts when they talk about the virtual loopback adapter features of Presentation Server. In this case, the softphone is literally just like any other Windows application that's accessed remotely via an ICA client. This is truly a complete office via ICA.


  • It will work on any client device that can run ICA with bi-directional audio.
  • No additional client-side software installation.
  • It will work from anywhere, since the user will already have an ICA connection open to the server. No need to worry about ports, firewalls, etc.
  • This would be a nice integrated package that would provide "everything" a remote worker needs. As Orestes says, "The entire office would follow the user."


  • The ICA protocol wasn't designed with realtime audio in mind, so there will most likely be performance problems.
  • There will most likely be audio quality problems.
  • Since the softphone software would be physically distant from the mic and speakers, its echo cancellation and other features might not work.

Install the VoIP soft phone client on the user's client device

You could also choose to bypass Citrix altogether and just install the softphone software directly onto the end user's client. Of course doing so pretty much means that Citrix would not be managing this application (unless you bundled the softphone with the ICA client, but that's still not a "real" Citrix-managed application).


  • Best performance and quality. This is how the softphone was meant to be used.


  • Lack of control of the softphone software.
  • The client device must be running on a platform that can support the softphone.
  • The softphone will require a separate install.
  • There may be connectivity issues since the softphone will use its own ports and protocols.
  • You won't have visibility into the performance of the softphone, and you'll have to balance "ICA versus SIP" traffic with some external performance tool.

The Hybrid Approach

The hybrid approach would involve some changes in the Presentation Server product from Citrix. The idea here is that you would still install the softphone software locally on your end users' client devices, but that Citrix would build a custom ICA virtual channel that would let you route the softphone SIP traffic through the ICA protocol up to the Presentation Server. Then the Presentation Server would communicate with the backend VoIP infrastructure from behind the firewall.


  • The softphone could use the existing ICA connection to the server, making connection and firewall traversal easier.
  • Since the softphone software was local to the user's workstation, this might actually work!


  • You would still need to install the softphone client on the end-user's workstation.
  • There still might be problems with quality, since the VoIP SIP protocol is UDP, and ICA is TCP. (i.e. You'd have to wrap the UDP in TCP.)

Again, Orestes stressed that they're just in the early phases of starting to think about this. This means that we, as the community, can really have an impact in the direction that they go. So what do you think? Is this something that's important? What would you do? What about adding video into this mix?

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

some quick thoughts:

i don't imagine voice traffic sounding good over ICA.  i can't see myself talking to someone and hearing their voice on "medium" audio quality. 

consider that citrix apps will slow down while using the softphone on non high-upload speed conx like residential broadband or a busy hotspot.  you can get a feel for this if you have certain types of VoIP (some ISP-provided solutions expand the pipe with VoIP-only bandwidth, while others take from whatever bandwidth is available) - when you take a call, it uses up most or all of the 45k or so you get to upload, so your Citrix response time will slow down.  not good if you'e using a sales entry app via Citrix and taking an order via softphone from a customer.  so you have a bandwidth limitation or bandwidth priorizing issue to work through.  keeping the app fast and the phone call seamlessly uninterrupted on common connections in my opinion seems a few years out.

what i'm thinking would be worth working towards is if citrix indeed built or branded a VoIP product and kept it out of ICA, yet closely integrated to the user experience.  stream the app to the worker via Tarpon, make the management look and feel the same for admins, and tie the VoIP security and protocol pieces into whatever the next evolution (or two) of advanced access gateway/secure gateway/netscaler/wanscaler thing will be. 

so...  as a worker you can launch the VoIP app from the same place you launch and get your other apps, some locally installed (streamed) software does the QoS (between apps and voice), and the connection securely uses UDP on the backend outside the ICA protocol stream. 

i would think this is a 1-5 year solution cause bandwidth will continue to increase and phones will continue to add functionality, so the phone becomes the central computing device and it uses VoIP and it brokers data to apps displayed via auxiliary roll-up stuff in your pocket screens.  
I don't think binding a virtual ip to the application would work. Your bound to the ip's left in the same segment as the server(s) reside in. Imagen you have 400 users daily on your farm but only 190 free ip's... big problem.. As far as I understand virtual ip's you are limited to the segment the server resides in? Or do you actually mean you bind it to and let all viop phones ring with that number. I don't truelly understand voip phone yet but as far as I am aware of it must have a unique ip adres to dial to. Well there is another disadvantage if you install the software on a citrix server.
If they could make it like a isa client communicating to a isa firewall you would have a unique session everytime the user dails. But what about the traffic towards the client(s)?
I think citrix has a long way to go with the fysykle limitations (ip addresses) before you can have a wide range of voip users on a citrixfarm.
The company I work for at the moment has voip phones seperatly from the citrixfarms and users at home have the software installed on there computers. This works fine for now.
Ah, but you are not limited to one subnet for a server, You could multi-home it to a secondary network or add additional subnets to the main one. Every enterprise router can handle multiple IP ranges for a given collison domain...
VOIP softphones really doesn't work very well with client mic and speaker but they do work well with USB headsets and handsets which are effectively a separate sound card.
Can Citrix virtualise these?
Cheers, Rob.
I've just been to a seminar on the roadmap for development of the Avaya IP Office telephone system, which we currently use reasonably successfully with Citrix with hard phones.
We looked at VOIP when we bought the new phone system earlier this year and dismissed it on the basis of cost, primarily the cost of the hard phones. The reason we wanted to use hard phones was partially for user familiarity but the show stopper was the fact that soft phones just don't work with thin client terminals which we use through the office.
As I sat in the Avaya seminar, I had some thoughts:
In five years, hard phones will be a dwindling market and if you're in the market of making them you need to diversify fast!
People in offices will be using very simple telephony devices that are in effect just a speaker and microphone. A little screen and even a keyboard will be very much optional
These devices will be USB, Bluetooth, wireless or mobile connected
Personally, I want to just use my mobile phone for all communication and it to just work whether I happen to be
Citrix Presentation Server must encompass these changes otherwise we'll be jumping ship
The current show stopper for Citrix Presentation Server is it's inability to use locally connected devices, currently USB being the most common mechanism. I'm actually a bit out of touch as to where we are with local devices but I suspect that using thing like Wyse thin clients is still as problem. We tend to use the cheapest thin terminals we can which I agree might limit out local connectivity options. The simple reason we don't buy the more expensive varients is currently don't need their functionality at the moment but if we did, I'd be recommending we buy PCs as the cost of these more powerful thin clients is difficult to justify. That's another problem with the thin client world - thin terminals are too expensive compared to PCs and the hardware they contain. I know about the economies of scale :-)
So if you accept that Citrix PS needs to address locally connected devices, I got to thinking about VMware. That virtualises USB devices pretty flawlessly - if a VM has focus and you plug in a USB device, the VM installs the drivers and Bob's your uncle.
Could Citrix use a similar technology to virtualise locally connected USB devices? The ICA client doesn't really virtualise hardware in the same way as VMware and Virtual PC but it could... Of course, this might be a problem for thin client devices but that's only because of their design. Considering you can get a full PC functionality for the same price as a thin terminal it's possible, maybe :-)
Of course, you'd have to be able to control this - don't want users triggering the installation of 3rd party device drivers all over the place! We'll be back to the bad old printer driver days ;-) You'd have to consider server farms as well. However, the two devices we want to connect now are USB pens and USB handsets/headsets, then these are recognised using Windows native drivers. Heck, could even plug in a camera.
Somebody please tell me Citrix already does this ;-)

Just a note in regards to this conversation coming from a Network Admin perspective and QoS.

In company wide deployments I have seen for VoIP QoS has been utilised to give the highest priority to VoIP traffic as it is a real-time application (can you class VoIP as an application?).  And ICA as it is not real time has always been a lower priority.  Historically this has been required as if VoIP packets become queued you get issues with jitter on the line and eventually it may drop out.

Now what seems to be occurring in XenApp is that it will be close to your PBX, so your VoIP traffic terminates at your XenApp box, and then you get audio coming from XenApp to your computer and coming the other way.  However if you are on a link that does not have a high priority on ICA, or over the Internet, will this work reliably enough?  I am not convinced myself that this would work to a level where end users would consider it to be a reliable solution.  Especially when XenApp is a solution that has been sold into many companies as a method of connecting back to your corporate network in a reliable fashion over a thin protocol.

Just my 2 cents.