Notes from BriForum: Bandwidth implications of HDX and PCoIP. (My HDX might be different from yours)

The choices you make in terms of remoting protocol and Operating System can have a significant on your VDI bandwidth.

Disclaimer: This is not a “protocol wars” article. They all work fine. Some just work better than others in certain situations.

One of my favorite BriForum sessions from this past year was one from Nick Rintalan and Dan Allen entitled “Protocol and Resolution Impact on Bandwidth and Scalability,” which sounds more like a PhD dissertation than an awesome session, but I learned quite a bit and wanted to share some of my notes here. The session went into lots of detail about the difference in network and CPU utilization of each of the main remoting protocols as well as tuning tips and best practices for ensuring the best experience. There is way too much detail to get into all of that in this article, so I’ll focus on the network and CPU utilization numbers for now.

The protocols covered include HDX and PCoIP (they also tested VMware’s Blast, but not the latest version, so I’m leaving that out). If you’ve been plugged in to the space for a while, you’ll know that HDX actually consists of several different protocols, and the protocol you use is based on use case, guest OS, client device, and sometimes network conditions. These different protocols (ThinWire, ThinWire Plus, Framehawk, and H.264) all contribute to the flexibility of HDX across many devices and connection types, and it shows how desktop virtualization has had to evolve over time to support those new devices and connections.

I wrote an article about the parts that make up HDX a few months ago, so you can refer back to that for more details (or at least more words), but to save you the time I’ll go through them a bit here: 

  • ThinWire – The first ICA protocol. Now called ThinWire Legacy or ThinWire Compatibility. Uses Microsoft GDI remoting, but since the GDI was removed from Windows 8 ThinWire only works on Windows 7 and older operating systems.
  • ThinWire Plus – This is the name for the latest generation of ThinWire, which was made for Windows 8 and newer. It’s based on Direct2D, which replaced GDI and GDI+.
  • H.264 – Citrix and others have put a lot of effort into H.264-based protocols because the H.264 codec has become the video codec of choice for both quality and bandwidth consumption. Most browsers natively support it, and many hardware vendors include both encoders and decoders in their systems. Even the Raspberry Pi has an H.264 decoder in it, making Raspberry Pi thin clients possible.
  • Framehawk – The technology that Citrix acquired back in 2014 was added to XenApp/XenDesktop 7.6 FP2, and is intended to support extreme network conditions. Framehawk’s pedigree dates back to its inventors’ work in spacecraft communication, so it’s specifically tuned for high latency, low bandwidth scenarios.

Nick and Dan compared these four elements of HDX along with PCoIP from VMware in their session. The test they ran was based on the LoginVSI Knowledge Worker workload, but with the video component removed, on a network with 1ms latency and 3Mbps limit. Each protocol was tested at multiple resolutions (single monitor 1024x768 and dual monitor, 2560x1440 + 1600x900), using VDI-optimized versions of both Windows 7 x64 and Windows 10 x64 with 2vCPUs and 4GB of RAM.

I took three things away from the bandwidth/CPU part of the presentation, so that’s what I want to share here. Honestly, I can probably write another article or two with the information in this one session. If you get a chance to see either of these guys repeat it at an event, you should go! 

1. When to use traditional protocols versus H.264

Speaking generally, they landed on following rule: Office workloads typically perform better from a CPU, bandwidth, and user experience perspective when using bitmap remoting as opposed to H.264. H.264 outperformed bitmap remoting in graphically-intensive scenarios such as video playback and 3D applications in terms of bandwidth and user experience, though it still had higher CPU consumption. This statement applies primarily to Citrix, especially in light of the recent changes to Blast, but it’s worth keeping in mind if you’re a Citrix shop. 

2. It’s evident why VMware is focusing on Blast

One consistent theme throughout the data presented is that PCoIP is a CPU hog compared to HDX. Though the bandwidth numbers were relatively similar (PCoIP edged out HDX in the 1024x768 tests with no protocol tuning, but lost out on the others), it consistently used around twice as much CPU as HDX.  The gap narrows a bit when comparing PCoIP to ThinWire +, but it’s still pretty significant (15% versus 9%).

Given that VMware has only so much influence on PCoIP, it’s not likely that the gap would narrow any time soon, and I believe this one of the reasons that they put a lot of effort into the Blast protocol. More on that at the end of the article.

3. The “+” in ThinWire + means more bandwidth

We knew that Citrix had to go back to the drawing board with ThinWire on Windows 8, and when they did they focused on achieving the same user experience as ThinWire Legacy. They succeeded, but it costs you dearly in the bandwidth department. Running the same workload on Windows 7 and Windows 10 with aggressive graphics optimizations, ThinWire saw bandwidth utilization of 74Kbps and CPU utilization of 8%. ThinWire +, on the other hand, clocked in at 194KBps of bandwidth and 9% CPU utilization.

This data is the most eye-opening information of the entire session to me (well, apart from the fact that even though Citrix gives you optimization templates, none of them are enabled out of the box…But that’s an article for another day). If you were already stretching your bandwidth for remote locations, the sheer act of switching from Windows 7 to Windows 10 can have a severe impact on your users. The good news is that user experience and host density won’t be affected, but make sure you have the bandwidth to pull it off!

What about Blast?

You can see why VMware has put so much effort into Blast. In fact, VMware suggests that Blast 2.0 has seen significant improvements in both bandwidth and CPU usage compared to the older version. Absent independent testing it’s hard to say exactly what those numbers are and how they compare with all the information above, but from the sound of it, VMware set their sights on Citrix and succeeded in matching, possibly exceeding, some of the ThinWire numbers.

Wrap-up

Ok, maybe that last bit was a protocol wars section, but it really only serves to reinforce that there is a protocol for your needs, and that any one of them is probably good enough. Still, it’s important to know the implications of using each one, so hopefully this article (and Nick & Dan’s session) was helpful. As the numbers on Blast 2.0 start coming out, we’ll be sure to circle back and post them here. 

Join the conversation

3 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

I know that brianmadden is always pro-Citrix, but why did you leave out the numbers about RAM and only mention the CPU numbers. RAM and CPU are often opposites, so I was wondering, are the RAM numbers that high for HDX that you couldn't mention them?
Cancel
Gabe,

The data in this presentation refutes your conclusion that VMware created Blast to save CPU in the VM. In all the measurements, Blast H.264 and PCoIP are either within measurement error of each other up to Blast being higher by almost 50%. This is for a simple workload that has no video or 3D content which would only make the Blast CPU load worse. This is an inherent limitation of the H.264 compression algorithms and is the reason that Citrix changed their default back to ThinWire rather than their previous default of the H.264 "SuperCodec".

The CPU impact of H.264 significantly impacts consolidation ratios on VDI servers. Thus, if you want to use Blast, you need to have an NVIDIA M10 or M60 GPU in the system to maintain consolidation ratios (and that will currently limit you to single monitors per VM based on the current VMware code)

Note that VMware has not claimed any CPU improvements in the 7.0.2 release that just came out. They claimed that JPEG/PNG mode (not H.264 mode) now uses 6X less bandwidth on video content. If you will look at my BriForum presentation, you will see that JPEG/PNG used ~22X more than Blast H.264, so a 6X improvement still makes it >3.5X worse (don't you love marketing...) and we know that the video image quality had to drop dramatically as well.

Their other claims for Blast improvements were 15% audio bandwidth reduction from using OPUS compression (which was released in PCoIP over two years ago) and reduced latency when using H.264 offload on M10 or M60 GPUs. We're in the middle of some LoginVSI benchmarking of latest release and will be glad to share it with you when we are done.

You do make a good point about the CPU load compared ThinWire and we were quite surprised by these results, but have not dug into what is behind this. Note that the CPU component is made up of two parts: the soft GPU being used to render the pixel (VMware uses a different one than Citrix) and the compression done by the codec. We will need to dig into this to determine what is the root cause as the CPU differences in the past have never been this significant.
Cancel

Kaboom,

I didn't include RAM numbers as the differences were not as much of a concern as CPU, but there was some difference.  H.264 used more RAM for both VMware and Citrix. Blast was ~100MB more than PCoIP and HDX H.264 was ~75MB RAM more than Adaptive Display. In all instances my Citrix workloads used about 150MB+ less memory than my VMware workloads, but I wouldn't read too much into that.

Cheers,

Dan

Cancel

-ADS BY GOOGLE

SearchVirtualDesktop

SearchEnterpriseDesktop

SearchServerVirtualization

SearchVMware

Close