Understanding the role of client and host CPUs, GPUs, and custom chips in RemoteFX

One of the major announcements from Microsoft last week was that the technology they acquired from Calista back in 2008 would be known as "RemoteFX" and released via Service Pack 1 for Windows 7 and Windows Server 2008 R2. Last week I wrote an overview and recorded a video demo of RemoteFX, but those two pieces just started to scratch the surface.

One of the major announcements from Microsoft last week was that the technology they acquired from Calista back in 2008 would be known as "RemoteFX" and released via Service Pack 1 for Windows 7 and Windows Server 2008 R2. Last week I wrote an overview and recorded a video demo of RemoteFX, but those two pieces just started to scratch the surface. In today's article I'll dig deeper into RemoteFX, focusing specifically on the role the CPU, GPU, and custom chips play in host, client, TS, and VDI scenarios.

Microsoft designed RemoteFX to be flexible, allowing a combination of CPU, GPU, and custom chip-based encoding and decoding. Any combination of host and client encoders and decoders are inter-compatible, and there's absolutely no difference on the network between the different methods of encoding or decoding.

RemoteFX on the host

Since RemoteFX is just an enhancement to RDP, it stands to reason that Microsoft will enable it in both of their major RDP host scenarios: RD Virtualization Host (VDI) and RD Session Host (Terminal Server).


For the VDI scenario, RemoteFX requires a GPU on the server. This GPU is then virtualized (a new capability of Hyper-V in 2008 R2 SP1) and presented to each VM just like any physical hardware component. What we don't know at this point is how many VMs a single GPU will be able to support. Microsoft has said that they'll eventually come out with sizing guidance and pointed out that the sizing is not based on VMs but rather the number of screens and pixels a specific GPU can support. And of course you'll be able to add multiple GPUs to each physical server, either via the PCI riser card in the server chassis or via external PCI chassis that could house lots of cards.

One advantage of the GPU add-in card is that it allows RemoteFX encoding on remote hosts that doesn't have a major negative impact on the performance of individual VMs or overall server density. (Certainly this was evident in the video where we saw six simultaneous Windows 7 VMs running video via RemoteFX with almost no practical impact on server CPU.)

In addition to the pure GPU(s)-based encoding option, Microsoft has also built a hardware design for a custom chip that could land in a server via an add-in card that's specifically built for RemoteFX encoding. (More on this custom chip later.) What's not clear to me at this point is if the custom add-in card is in addition to a GPU or replaces it. Either way we can assume that the custom card will provide more bang for the buck than the GPU option (since that's kind of the whole point of custom silicon).


For RDSH /TS use case, a GPU is still required, but Hyper-V not. (interesting!) We can only assume this is due to the fact that every RDSH session has access to the physical hardware, so you don't need to virtualize the GPU to make it available to more than one session. (If that's true then we can probably also assume you can only get RemoteFX on virtual RDSH servers if they're running on Hyper-V since other hypervisors wouldn't present the GPU to the VM.)

By the way, Citrix has gone on record to say that they will support RemoteFX on RDSH via HDX to XenApp, although they're not able to provide any more details at this time other than saying, "yes we're planning to support this."


It's not known whether Microsoft will support RemoteFX on physical hosts with GPUs. At first you'd think "Who cares? What's the use case for that?" But in addition to being a sweet home media center solution, supporting a "standalone" RemoteFX will potentially open the door to people running Windows 7 RemoteFX hosts on VMware ESX if VMware ever decides to virtualize the GPU.

RemoteFX on the client

RemoteFX clients can come in many different forms. Right off the bat there will be a new version of the RDP client which will enable RemoteFX for Win32 clients, including full PCs and Windows XP Embedded "thin" clients. We also saw clients with the custom chips—one based on WinCE and one based on Linux. All the major thin client vendors (including HP, Wyse, and DevonIT) have expressed support for the protocol.

Understanding the RemoteFX "custom chip" ASIC

We've mentioned that RemoteFX can use a custom chip (called an "ASIC") on both the encoder and decoder side. This is conceptually similar to Teradici's implementation of PC-over-IP but with one main difference. Teradici sells chips to hardware partners that they put into their own products. Microsoft has merely designed the chips which they'll license to partners to be implemented in whatever ways the partners see fit.

In practical purposes this means that while Teradici PC-over-IP chips are dedicated standalone chips, RemoteFX hardware will probably end up being built-in to other things, like System-on-a-Chip (SoC) products.

In the microchip world, it's all about gates. (Like how many gates does this take or does that take.) Chip manufacturers know how big their gates are and how big their chips are, so a SoC vendor might say, "Hey, we have enough extra real estate on our chip to add the additional gates needed to let it do hardware RemoteFX decoding." So ultimately RemoteFX is NOT another chip—it's just the manufacturing process of engraving some more parts into SOCs that probably have the room anyway.

By the way, Microsoft doesn't come out and say that RemoteFX needs x number of gates because the actual number can vary drastically based on implementation. So a SoC that supports two displays will need more chips than one that supports a single display. And that's really the cool thing about the RemoteFX chips. How many sessions will a RemoteFX card support? How many displays and pixels? Is it built-in to a TV, a SoC, or a standalone card? Is it built-in to a server? This is all up to the specific implementation choices of the hardware vendors.

Differences between RemoteFX and VMware's PC-over-IP

The main difference is on the host side, since RemoteFX requires a GPU or custom plug-in card. This is something that VMware blogger Mike Coleman jumped on immediately, claiming that customers don't want to install GPUs into their servers. This is something that remains to be seen. I mean sure, no one has GPUs in their servers today, but that's because there's never been a reason too. And VMware's software-based host encoder for PC-over-IP is CPU intensive and affects user density in a big way. (i.e. If you enable it, you get fewer users per server.) So in VMware's world you don't have to install GPUs in your servers, but you have to build extra servers to make up for the density you lose by encoding PC-over-IP on the CPU?!?

At this point we don't know which will ending up being the biggest impact in terms of cost, power, and rack space, but I'm sure we'll know soon enough.

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

RemoteFX is a fantastic addition to the mix - and as you've stated the best part is the flexiblity around implementation.

Having this enabled for sessions (RDS/TS) as well as desktops (HVD/VDI) is crucial and it'll be awesome to see how the technology partners take this and extend and integrate it into their own offerings - WAN performance/capability will be the measuring stick as always.



Am I missing something here? Or did Microsoft & Citrix just say No to Hyper-V & Xen Desktrop on Cisco UCS & any other blade system? How are blade systems going to support RemoteFX?




Just to be clear, as it is implemented currently, RemoteFX requires a GPU within the host node under all virtualization scenarios.

That will severely hinder it's utilization on blade servers in the short term.  It's also interesting that Intel hasn't made a peep about RemoteFX.  Honestly, I would hate to see a bunch of proprietary display protocols that require seperate ASIC/SoC implementations within a server.  Talk about Betamax vs VHS!

Also, it would appear container based virtualization platforms that will eventually support Win2k8 R2 SP1, such as Parallels Virtuozzo Containers, would be able to take advantage of RemoteFX using a GPU within the host node.


Without trying to do any ad for this product : GPU virtualization is on the XCI (Xen Client Initiative) [open source], I think... something we could probably get from Xen compliant hypervisor in a future. As there is only 3 chips manufacturer (AMD, ATI and Intel)...


"This is something that VMware blogger Mike Coleman jumped on immediately, claiming that customers don't want to install GPUs into their servers."

well, within 12 months latest, noone will have to, because of AMD's Fusion Chips with integrated (multicore-)GPUs, so lots of avaiable GPU-power by then .

Likely Intel will then jump on the bandwagon, too.

Very likely, Microsoft has had that in mind when designing remote fx.


An open question...... Are you bothered (for whatever reason) if you have to add GPUs to your servers?

Personally for my institution it makes little difference. Depending on VM's per GPU it might not actually cost "that much".

But then every use case/business is different.

I agree it probably wont be long before servers start coming with this stuff as standard anyway...


If RFX can only be used on Hyper-V at the moment then what does that say about RFX for the client when client hypervisors are thrown in the mix?

With the recent announcements the exclusion of certain information pertaining to how deep the collaboration goes between MS and Citrix leaves me wondering of the possibilites.

In my opinion, The aforementioned integration of HDX and RFX will hopefully pave the way for the following:

- XenClient use of HDX/RFX

- XenServer use of HDX/RFX

In other words, integrating the Remote Display Protocol into the hypervisor as mentioned in a previous article.

If Citrix wants to keep XenServer relevent they will most likely have to present a case (other than bundling XenServer Essentials) where it is more attractive to choose XenServer over Hyper-V when doing XenDesktop deployments.

Maybe that's where XenClient can come into play to offer advance features for XenDesktop if paired with XenServer.



Let me clarify Mike Coleman’s comments about GPUs in servers. Customers may *want* to install GPUs in their servers, however, GPUs will not physically fit into most servers. For example, the GPU that Microsoft used for their demonstration was an NVIDIA FX5800. This card is a double-wide x16 PCIe card that consumes 189 Watts. The vast majority of virtualized servers are either Blade servers or 2U rack servers. If you survey the PCIe slots available in these servers from the major server OEMs, none of them have a double-wide PCIe slot (i.e. no place to plug this card). In addition, the FX5800 retails for about $3,000. A Dell 2970 server with dual, hex-core Opterons is available on their website for less than $2,000, so it will be less expensive to buy another server than to use an FX5800 as a RemoteFX offload engine.

The slot size and cost are not even the biggest issues. Rack and Blade servers typically support only 25 Watts per slot due to the cooling capacity of these dense servers. An NVIDIA FX370LP is only 25 Watts so could be used in 2U servers that have a x16 slot (none have more than one x16 slot). The FX370LP has only 8 “CUDA cores” compared to the FX5800 which has 240 “CUDA cores”. This means it will support about 30 times fewer displays when offloading RemoteFX.

Between now and the availability of RemoteFX, the GPU vendors could release some newer 25 Watt GPUs on x8 PCIe cards so that more than one GPU can fit into a server, but the 25 Watt power limit will significantly constrain the number of “CUDA cores” that can be included and, therefore, significantly limit the number of displays that can be supported. The other alternative is to put all the GPUs into a PCIe expansion chassis which adds additional cost and takes up valuable space in the datacenter. Neither option is very attractive.

The fundamental reason that GPUs are not a good choice for image compression is that they were designed and optimized for a completely different purpose: rendering pixels for CAD, animation, and gaming. Image compression primarily works with 32-bit pixels which allows fixed-point arithmetic to be used. GPUs must use 32-bit floating-point arithmetic to render 3D images. Floating-point arithmetic requires massively more gates in the silicon than fixed-point arithmetic. This makes GPUs power- and cost-inefficient when doing image compression (in fact, using the multimedia instructions in the server CPU for software image compression is a less costly approach).

Don’t get me wrong, the vGPU capability that Microsoft pre-announced will be of value to workstation customers that want to share a beefy GPU between multiple users of 3D applications. In fact, it’s interesting that the “server” Tad was using in your video is actually a Dell Precision Workstation. However, the sharing of a GPU between virtual machines will not likely be unique to Microsoft.

Finally, you stated,

“And VMware's software-based host encoder for PC-over-IP is CPU intensive and affects user density in a big way. (i.e. If you enable it, you get fewer users per server)”.

This is absolutely not consistent with benchmarking we have done. VMware View 4 is achieving the same server consolidation ratios as VMware View 3 did with RDP while XenDesktop 4 consolidation ratios are 20-30% lower than either.



I'm not sure that representing GPUs as just rendering engines is accurate. GPU's should really be considered as floating point math processors that can also be used as very effective graphics processors.  You only have to look at the uses that high end GPUs are used for that have nothing to do with graphics (Computational Chemistry, Molecular Dynamics, Data Mining and Analytics etc.) to see that. NVIDIA are pushing their Fermi architecture GPUs as high performance computing systems with only a passing reference to their graphics processing roots and claiming that they "... deliver equivalent performance at 1/20th the power consumption and 1/10th the cost"  to the latest quad-core CPUs.

Having said that, I agree with you points on the difficulty in shoehorning GPUs into blade systems, there is neither the space, power or cooling capacity to get anything but the smallest of GPUs into a blade server.  However if vendors see an opportunity here then we can be assured that they will find ways to overcome the problem (dedicated GPU server/blades may be a possibility here).  This may also turn to your advantage, if Cisco were to offer a " Remote_FX Compression Blade" forUCS there is every reason to expect that a PCoIP Compression Blade would also be offered.

We must also remember that GPU based RFX is only one option. IF Teradici and VMware can create a s/w implementation of the PCoIP engine that does not impact server consolidation ratios, then there is no reason to think that Microsoft can't do so as well. Personally I find it hard to accept that anyone can develop a s/w solution that does not impact server scalability to some degree and would suggest that comparing the performance of View 3 with View 4 + PCoIP is not valid.  Perhaps you should share the View 4 RDP and View 4 PCoIP performance results instead.




In terms of the RDSH / TS use case for Remote FX to be enabled for WTS or RDS on physical tin, in our case our manufacturers are starting to hold central video based training that needs to be delivered real time to lots of end users who are using a TS Session Host, which reduces the need for flights, hotels etc. I for one would like to see Remote FX working outside of Hyper-V, basically RDP has had the rich desktop experience missing from the start and to me this only plugs the hole.

Wonder how this would work with mutli core GPU's when we finally get the same number of cores on a GPU that we can get on a CPU ? Albeit Intel have abandoned Larrabee pity, Hyper-V or ESX could have been able to allocate a GPU to a VM or via the virtual GPU to multiple VM's. Hmmmm


I don't believe RDSH requires a GPU...  in fact i know it does not...  there is an offload card coming to allow scale but its more of a compression type card to offload the CPU from those tasks and bring the scale back to the densities you expect from RDSH...