Recently, we posed some questions via email to VMware Sr. Product Manager Warren Ponder and Teradici CTO Randy Groves about the inner workings of both the hard and soft PC-over-IP implementations used in VMware View. The answers we received were so in-depth that we decided to publish it as an interview rather than rewriting them into an article format. Below are the questions we asked and the thorough answers we received:
(BrianMadden.com) The first question is what happens when you have a hardware client connecting to a software host? Does the software host recognize it’s a software client that can’t handle GDI primitives, which causes the software host to sort of emulate what the old hardware host did and send more pixels and screen scrapes instead of GDI stuff? If so, how does that impact user experience, host load, and bandwidth consumption? Or does the firmware update for the various hardware clients allow them to emulate the software client and process GDI-type stuff locally in addition to them being able to receive the full screens from remote hosts? If so, does that have any effect on experience?
(Warren Ponder & Randy Groves) The PCoIP protocol relies completely on host rendering of all pixels (except when MMR comes into play). This means all GDI primitives and other graphics commands are rendered by the CPU in a VMware View desktop or by the GPU in a workstation with a hardware host. GDI primitives can be efficiently rendered with a CPU (<2% CPU overhead for most office applications). CPU-rendering only becomes CPU intensive when decoding video or 3D applications. A key advantage of host-side rendering is that it guarantees compatibility with all current and future applications and video CODECs because there is no dependency on the network or client device. For example; with Windows 7 WPF and applications based on .Net 3.5 SP1 certain primitive remoting benefits go away.
The PCoIP protocol takes the raw pixels from the frame buffer, categorizes the pixels into different image types (image decomposition), and then compresses the pixels using a compression CODEC most appropriate for each image type. At a high-level, this process is identical between hardware and software hosts.
Because everything is host-side rendered, the client devices are nothing more than simple decompression or decoder engines. This is analogous to how a digital TV works by taking in pixels that have been compressed using MPEG or H.264 CODECs and decoding them. The difference is that digital TV CODECs like MPEG and H.264 are inefficient for common desktop image types like text and computer graphics. Your digital TV doesn’t render pixels; pixels are all created in the TV studio. Likewise, with host-side rendering, a PCoIP client only has to decode compressed pixels using the PCoIP CODECs.
The primary differences between a PCoIP zero client and a PCoIP software client is the maintainability and security aspects of a device that does not have an operating system or browser which must be patched monthly and requires anti-virus/spyware. While PCoIP zero clients can have their firmware upgraded to incorporate new features, this is at the sole discretion of the customer based on whether they need the new features as opposed to security patches that must be applied regardless. Teradici and VMware are committed to full backwards compatibility so that any zero client installed today with firmware version 3.0 or later, will continue to interoperate with all future VMware View releases. Firmware upgrades will be limited to new features, not new releases from VMware. Also zero clients offer a more assured experience. It often is challenging to guarantee exactly the same performance using soft clients due to the possibility of resource contention on the client device when competing with other services or software.
Apart from performance, there are some other differences between soft and zero clients. For supported video types, software clients can use MMR to decode the video on the client instead of the host, whereas zero clients based on Tera1 silicon chips will require all video to be decoded at the host. Some other feature differences exist between the current versions of zero client firmware and the software client. For example, the software client supports virtual printing and some USB devices that the current zero client firmware does not support when connected to a VMware View desktop (note, the zero client supports all USB devices when connected to a hardware host). Over future releases, these gaps will be addressed.
Along those lines, is there any difference on the wire between a software host sending to a software client versus a hardware client? How does that impact user experience, host load, and bandwidth consumption?
The PCoIP protocol is identical on the wire when using either a soft or zero client. However, various protocol features are optimized based on both the host and the client. For example, when a hardware host is connected to a zero client, image decomposition is at the pixel level. However, if either end point is software, then image decomposition becomes more pixel-block oriented. This is done to reduce the CPU utilization for the software host/client and uses the exact same protocol on the wire. The software host uses the exact same optimizations whether it is connected to a soft or a zero client, whereas the hardware host uses different optimizations when connected to a soft client than a zero client.
Even though a software host uses the same optimizations with either client, the load on the host can be slightly higher for a zero client than for a soft client. This is because the zero client can process pixels much faster than soft clients (>30X in some cases). This allows the software host to process pixels at full speed most of the time. With a software client, the software host cannot generate pixels any faster than the client is capable of processing them, so the host CPU load of the PCoIP software encoder is partially dependent on the performance of the attached client. This difference is also evident between slower and faster soft clients. In fact, RDP, RFX, and ICA will also have slightly higher host loading with faster client devices for the same reason. Of course, this slight increase in host CPU load, is delivering a “snappier” experience to the client which is a good thing. That said, this CPU impact is a second order effect at best and only applies if the network is not constraining performance.
With a software host and software client, is there any kind of client-side caching? What about with a hardware client? The reason I ask is that during our Geek Week (with software host and hardware client), we noticed that when scrolling in Word, if you scrolled slowly we’d only get the build-to-lossless on the few “new” lines that scrolled onto the page... The lines that were already on the screen before we started the scroll action stayed crisp. But if we scrolled faster, the entire screen went back to build-to-lossless, even though some of the lines were already on the screen. What’s going on there?
As part of image decomposition, the PCoIP protocol makes use of motion estimation and compensation. The host side attempts to detect groups of pixels that have moved between screen changes. This is called motion estimation for some subtle technical reasons, but it’s probably easier to think of it as motion detection. The details of any detected motion are sent to the client device which copies the pixels from their original location to their new location. This is a form of caching, but currently only applies to pixels that were visible in the immediate prior screen.
As you can imagine, motion detection could be very CPU-intensive if you search every possible position on the screen (e.g. 2+ million comparisons for each pixel block for a 19x10 screen). Since it is much more likely that a block of pixels has moved only a small distance between screens, most motion estimation algorithms limit their search to a region of nearby pixels rather than the full screen. The effect you are seeing when you scroll fast enough is caused by the fact that the pixels moved beyond the search region of the current motion estimation algorithm in the software host. The next release of the software host will have a new algorithm that searches a wider region with minimal CPU impact and will detect a broader range of motion.
An important thing to note is that motion compensation on the client side has no CPU limitations as to how far the pixels can be moved since it takes just as much effort move a block of pixels one pixel away as it does to move them to the other side of the screen. Thus, as the host-side motion estimation algorithms improve, the client side devices will be able to easily take advantage of these improvements.
Thanks to Randy and Warren for taking the time to answer our questions. Both have been known to comment here, so if you have more questions, post them below and we'll see about getting an answer.