Microsoft’s RDP host-side rendering plans include optional GPU offload hardware & custom chips.

Earlier this week I wrote about about Microsoft's last minute change to RDP in Windows 7, where all DirectX components will now be rendered on the host. I based my post on an MSDN blog post by Microsoft employees Christa Anderson, Gaurav Daga, and Nelly Porter.

Earlier this week I wrote about about Microsoft’s last minute change to RDP in Windows 7, where all DirectX components will now be rendered on the host. I based my post on an MSDN blog post by Microsoft employees Christa Anderson, Gaurav Daga, and Nelly Porter.

In the comments of the original MSDN post, reader David Rottenberg wrote that this change was a bad idea, partially because host-side software-based DirectX rendering would be slow. Microsoft’s Guarav Daga’s response included:

As for running DirectX applications on Windows Server 2008 R2 Hyper-V virtual machines, there will be the GPU offload hardware assist Calista technologies at some point in the future.

What's this?!? GPU offload hardware? I knew that the concept existed (both with things like NVIDIA's Tesla hardware as well as ASIC-based solutions like Teradici's PC-over-IP. However, I didn't know that Microsoft was planning this for RDP. I did a quick Google search for [calista hardware] and found this PowerPoint presentation by Nelly Porter from WinHEC 08, the same conference that prompted my November 2008 update about Calista.

Even though I only have the slides without notes or video, there are a few interesting points we can find here. First, Slide 10 shows three different modes of host side rendering in RDP 7+:

  • GPU-based codec
  • ASIC-based codec
  • CPU-based codec

ENT-T591_WH08-10

My assumption is that the GPU-based codec uses the host-side GPU to render screen elements. I would think that in this case we'd be limited to a one-to-one GPU-to-client relationship, although I don't know that and it certainly may be possible to "slice" the GPU so that multiple sessions or VMs can share it. Alternately, we may be able to use an external device full of GPUs (like the Tesla stuff).

That said, Daga's quote mentioned " there will be the GPU offload hardware assist Calista technologies at some point in the future," meaning they're not here now. (i.e. this is something new that does not exist yet.) So what's this mean? Will Microsoft sell GPU hardware cards? Or, more likely, will companies like HP and Dell sell rackable boxes of GPUs (kind of like how you can buy DAS from them today?)

For the ASIC-based codec, my assumption is that will also be a future product that will be a lot like Teradici's PC-over-IP Terachip. I assume that Microsoft will design a special chip (the "ASIC") that will be purpose built for rendering host-side graphics and sending them to clients via RDP.

And finally, the CPU-based codec is kind of like what we already have today, where the normal host-side CPU resources will be used to render all the screen elements.

Continuing through the WInHEC slides, we also find that Slide 22 is interesting: (click it to enlarge)

ENT-T591_WH08-22

This is talking about the future of Calista. (The fine print at the bottom says that the "release vehicle" was not known at that time, in that they didn't know if that would be Windows 7 or some point in the future.)

This slide shows a few different rich media apps--Flash, Silverlight, Real, and Windows Media--that all sit on top of a "Calista Graphics Intercept & VGPU." (I assume that means "Virtual GPU?") That whole stack sits on top of the Calista Device Driver, which in turn rides on top of either the GPU, ASIC, or CPU. My assumption here is that this means you get the same functionality regardless of which rendering engine you're using (although obviously performance and scalability will be affected.)

What's most interesting to me about this is the "VGPU" and how that whole diagram is inside the green box called "Windows Client in Hyper-V Guest." Does that mean that Calista requires the VGPU, and therefore requires Hyper-V? (Because I don't think any other hypervisor exposes a VGPU to the guests?) If so, wow, wow, and WOW!

It's also interesting to see (in the list to the right of this slide) that an "Optional HW ASIC-based decoder enables ultra-lightweight client scenarios." Umm.. PC-over-IP anyone? So the host-side can be all software (CPU), or can use host GPU(s) or a custom plug-in card, all of which require Hyper-V. And the client can be software (probably for rich clients) or can have a chip for an ultra thin client.

Wow! Does VMware stand a chance? Does Citrix? Does Teradici?

I’m not sure when this information was made public. (I mean obviously the stuff at WinHEC was all public. I just don’t know when this info made it out to the web.) But I haven't read anything about this, so let’s discuss!

Join the conversation

13 comments

Send me notifications when other members comment.

Please create a username to comment.

Adding GPU virtulization into the hypervisor is a logical next step for all of the providers. Of course MS wants to lock in the entire industry with a sub par RDP protcol into Hyper-V and  they will never innovate as fast as the rest of the industry and the monopoly they are, they will continue to leverage their Windows franchise to dominate the desktop. I think Citrix and VMWare will be around for a long time, because in the real world customers want choice to gain proce leverage over Microsoft. If Microsoft, is allowed to become too strong, we will get even worse products from them slower, the move to th cloud will be slower and they will create massive inertia to stop us moving to MAC, Linix desktops etc. So yeah, good move, years late, RDP 7 is not as good as people hype it up to be, and I expect the others will just do more faster. I have little faith in anything Microsoft, expect that they will provide the CORE pieces of the infrastructure.  


Cancel

VMware, Microsoft and Citrix are increasingly promoting host rendering - because host rendering solves many issues with remote desktop delivery over LAN and WAN networks.


Host rendering results in the need for a hardware *option* in the server.  Teradici's PCoIP technology supports software PCoIP (software flexibility/mobility) and hardware PCoIP (scalability, power efficiency and performance).  


Client Rendering Issues:


No Investment protection - client rendering requires constant evolution and "catch-up" to solve application interoperability issues and support new graphics and media codecs.  Often protocol upgrades require new client hardware.  Fortune 100 companies tell us that their average TC is obsolete in <3yrs.  Bad for the ROI case.


Increased client maintenance costs - OS patches, anti-virus and constant media-capability upgrades = constant maintenance. Requires a GPU in the server AND an expensive GPU in the client. Again bad for ROI.


Application interoperability and User experience issues - application and users must wait for remote rendering to be reliably delivered and executed in order.  Just try a simple directory listing in a DOS box on a RDP/ICA thin client to see this with only simple text graphics.


Ceiling on desktop performance - remote rendering imposes limits on the 2D /3D graphics performance due to the sequential execution of OpenGL/DirectX commands that now have a network/latency slowing the graphics and application execution.  This is a fundamental limitation of client rendering solutions.  


Poor multimedia support (flash redirection is actually host transcoded to a media CODEC the client can decode).  No video codec convergence is expected any time soon.  Try to run Silverlight and Video furnace with client rendered modes of RDP and ICA.  


Poor network performance - sending rendering commands requires loss-less/reliable delivery - this limits compression options and forces a significant increase in network bandwidth requirements.  Remote rendering performance degrades significantly with congestion, latency and packet loss.  


Host rendering solves these issues for both thin clients (software PCoIP in future VMware View) and hardware zero clients (hardware PCoIP see - Dell FX100, IBM CP20, Samsung 19" 930ND, Fujitsu, Amulet-Hotkey, Clearcube, DevonIT, EVGA, ELSA, Verari).


Stu - Dir of Biz Dev @ Teradici.  


Cancel

Does upgrading my Server HW fleet with Teradici really justify an ROI. Do you really believe that the 80% of use cases need this level of performance? Also the PC costs can be extended out to 5-7 years with VDI if they are repuposed as nothing more than dumb connection machines for the massed.


Cancel

Actually with Windows Server 2008 R2 RDS on physical harwdare we can already offload the application 3D rendering to the GPU on the server and then we send the aplication using bitmap remoting. Of course this means we need to start adding GPUs to our servers.


Keep looking at our our RDS blog where we will reveal more information on Windows 2008 R2 features and capabilities.


Cancel

Yes, we are sharing the ROI details with Fortune 1000 customers now.  


I agree - the main reason for selecting host acceleration hardware will be for VDI scalability - more VMs per server due to the compression/encryption/network transport off-load.  


Having said that, some will need the additional desktop performance too.  But scalability will drive the ROI.  


Also agree that re-purposing PC's is a great way to lower the CAPEX costs of a VDI installation.  That along with supporting notebooks for mobility were the drivers behind Teradici developing a software PCoIP client - which we have licensed to VMware for integration into their VMware View products.  


VMware View with software PCoIP will maintain compatibility with the hardware zero client (where these are needed), and future server PCoIP virtual desktop accelerator products.  IT get the complete flexibility they need to balance cost, mobility, scalability and desktop performance.  


Cancel

@stuart


Tradtional TS gives me the best ROI and scale on the backend, That said I am curious is there any technical reason PCoIP can't help with the TS use case?


Cancel

@appdetective


>Does upgrading my Server HW fleet with Teradici really justify an ROI. Do you really believe that the 80% of use cases need this level of performance? Also the PC costs can be extended out to 5-7 years with VDI if they are repurposed as nothing more than dumb connection machines for the masses.


If software solutions cover 80% of the use cases, why are remote desktops <<10% of the enterprise desktop market? For use cases that are 100% satisfied by software solutions, then PC-over-IP hardware acceleration technology will have limited benefit. However, since the marketplace indicates that >90% of the use cases  are not covered by software solutions today, PC-over-IP server hardware will provide scalable usability which will significantly expand the available market for remote desktops well beyond the single digit market share it has had since its inception.


As Stu points out, repurposed PCs are a great way to reduce initial capital expenditures of a remote desktop deployment provided they have sufficient performance to deliver the required user experience. However, there is the ongoing operating expense of virus protection and operating system upgrades, not to mention all the wasted power, that continue as long as you keep those PCs around. When the PC finally bites the dust, replacing it with a hardware zero-client rather than another PC is a no-brainer (especially with PCoIP integrated monitors). It remains to be seen whether there will be Calista zero-clients or whether a Windows OS will always be required. Anyone want to place a bet...?


>Tradtional TS gives me the best ROI and scale on the backend, That said I am curious is there any technical reason PCoIP can't help with the TS use case?


Again, for use cases where TS is adequate, go with it. PC-over-IP technology allows remote desktops to boldly go where none has gone before. If the question is whether PC-over-IP acceleration could be used in a TS environment as well as a virtualized environment, the answer is yes, provided Microsoft allowed a consistent way to force Terminal Services to operate in a purely host-rendered mode. Today, that capability is not exposed in an officially supported API.


Randy Groves


CTO


Teradici


Cancel

@Randy,


I'd beg to differ on your premise of % adoption of VDI. It's more to do with upfront CapEx, because the big players Citrix, VMWare etc, still believe that network storage is the way to do this. To achieve OpEx efficiency. This overly complex model is years out due to the maturity of the management tools and the time it will take for admins to shift mind set, and of course the economy. Add to that to drive such a model customers must understand their own inventory, and in so many cases that is just a mess. Only forward thinking people are doing this today for different reasons that I have posted numreous times prior.


So depending on where you are in your PC cycle, in my case a mixed bag, I am simply not buying new ones with VDI, and can easily reduce the management OpEx of those machines by re-thinking how those builds are composed etc. Granted there are still use cases where ICA performance is not there today, but I can offload much of that stuff to the local host in many use cases, or simply redistribute my fleet to deal with horsepower, as there is ZERO captial spend on PC's in today's economy. That includes justification for custom hardware on the back as well. Where I do buy it will be cheap commodity hardware to allow for scale and allows me to create pricing leverage from my suppliers. I'm not saying this is perfect all around, so I do what I can within the constraints that I have and I am sure many others do. In cases where none of these solutions work, users will simply be left with their exisiting PC's until they dies and by then I hope ICA has many of the Apollo features ready, since that what I use to address broad use cases. I wish ICA was a standard and Citrix would open this up a bit to avoid all this confusion around protocols, we don't have HTTP from 400 vendors, they just add to it. If not then perhaps there are longer term niche use cases for hardware based solutions, but I just don't the ROI story on something that is not commodity. A different story if HP, DELL etc shipped with your solution.....


TS with PC-IP would be really cool to see, even if it was an Unoffical video just customers can go back to MS etc and ask why the monopoly is not giving choice and slowing innnovation.


Nothing personal here, and I am glad you posted your thoughts, Just my perspective from being beaten over the head for years doing real world implementations. Best of luck to you, I hope you help shape the market.


Cancel

Certainly Fibre Channel storage is a non-starter, but deduping technologies like View Composer and less expensive iSCSI network storage have essentially closed the storage cost gap for VDI.


From a protocol perspective, since the PCoIP protocol is host-rendered, it will support all of your use cases from day one rather than having to constantly play catch up with new codecs and applications like the client-rendered protocols. So, even if you're not sold on the value of hardware acceleration, you'll still find huge value to the software PCoIP implementations by enabling use cases that have not been possible with existing software solutions.


Cancel

Even with iSCSI the price per i/o today does not make sense for Desktop workloads. A simple example, boot thousands of desktops and see the cache requirements and cost of the storage array go through the roof if you want your OS to boot with errors. iSCSI is great for data volumes but the cost is too high for Desktops, and that's based on real world experience and listening to all the pitches and testing of NetApp, EMC, HP etc. Biggest problem with VMWare is that they are a storage company trying to sell Desktops and have no clue how the real world works and keep marrying crappy $$$$ storage solutions to their offering, Citrix keeps copying instead of doing what they are good at, deliver stuff to the enterprise the right way. Emerging companies like Atlantis and Unidesk could potentially address this for DAS, but right now nothing that is product.....


Cancel

Brian; Your insight is right on.


>"there will be the GPU offload hardware assist Calista technologies at some point in the future," meaning they're not here now. (i.e. this is something new that does not exist yet.)  So what's this mean? Will Microsoft sell GPU hardware cards?"


>I assume that Microsoft will design a special chip (the "ASIC") that will be purpose built for rendering host-side graphics and sending them to clients


Correct.  There is a patented Microsoft GPU.  It does exist and the very small sbc it resides on is ready for production.  It works great.  Anyone with this very inexpensive, tiny box will not need a PC to experience Microsoft software.


>"Optional HW ASIC-based decoder enables ultra-lightweight client scenarios." Umm.. PC-over-IP anyone? So the host-side can be all software (CPU), or can use host GPU(s) or a custom plug-in card, all of which require Hyper-V. And the client can be software (probably for rich clients) or can have a chip for an ultra thin client.


Precisely. Projections are for several tens of millions - aimed at developing countries where PC's are expensive to buy and expensive to run.


Cancel

The key to virtualizing graphics and multimedia is to use the resources on both the server and the client in the most efficient manner. Server-side rendering provides the foundation for remoting any type of graphics or multimedia content, and it is great to see Microsoft continuing to enhance the capabilities of the Windows Server platform. Citrix offers HDX Adaptive Orchestration, built into both XenDesktop and XenApp. In addition to supporting server-side rendering, Adaptive Orchestration opportunistically leverages available endpoint resources to offload the server. This not only increases scalability (thus reducing costs), but also often yields huge savings in network bandwidth requirements. It isn't going to be just client-side or just server-side. It is all about dynamically selecting the right approach for the scenario at hand.


Cancel

Onlive.com, They designed a gaming platform and solved a whole host of VDI issues, in a cloud environment.


I suspect this model will prove the validly of offload GPU tech with such large and young companies in the game.


Cancel

-ADS BY GOOGLE

SearchVirtualDesktop

SearchEnterpriseDesktop

SearchServerVirtualization

SearchVMware

Close