How HTML5 remote desktop clients work

Time to break down HTM5 remote desktop clients as a follow up to my article around the Chromebook.

Earlier this week, I wrote about my initial thoughts on the Chromebook, and I talked a little bit about HTML5 remote desktop clients, specifically AccessNow from Ericom. In the comments, we also heard from the creator of Spark View, Walter Wang. Walter's comment, plus a subsequent phone call with Ericom, helped to shed some light on exactly how Chrome (and other HTML5 compliant browsers, which is all of the big ones now, I think) use HTML5 technologies to show remote desktops. In that article, I speculated that Ericom was somehow wrapping RDP and shipping it to the client. It turns out that what actually is happening is a bit more complex, and it involves translating RDP data for consumption by the browser. Before I get too far ahead, though, let's break this down.

There are two key technologies that enable remote desktop clients within a browser, WebSockets and Canvas. WebSockets is how the remote desktop data is sent from your environment to the browser, and Canvas is the technology that allows it to be redrawn on the screen.

WebSockets is a protocol/API that is built in to all the recent browsers that allows for continuous transmission of data via one TCP socket, as opposed to HTTP, which requires each request to have a response. Multiple requests, then, require multiple connections, which is pretty complex and inefficient for anything that needs to have a realtime feel to it. WebSockets changes this by essentially opening a channel between the client and the server that remains open between requests. The main drawback of WebSockets is that it only supports textual data, not binary data (which is what remote protocols use), which we'll get into later.

Canvas was created by Apple way back in 2004, and has grown into being a native HTML5 element. Canvas enables the ability to control every single pixel discretely through the use of javascript, which allows the browser to render 2d graphics dynamically. When you see animations or games that play in the browser and don't use Flash (i.e. HTML5 games like Angry Birds for Chrome), you're seeing Canvas in action. For remote desktop connections, the client (in this case, mostly a javascript program) consumes the data coming in via WebSockets and draws the desktop on the screen via Canvas.

Right now you may be thinking " binary data support...that's not RDP at all," which is absolutely correct. But if what you're using at the client isn't RDP, then how is this working? The secret there is with a gateway of sorts. Ericom calls this AccessNow Server (which is really just a lightweight service), and Spark View calls it a Spark Gateway. In both cases, these gateways establish an RDP session with the remote host and translate (or re-encode) that binary data into textual data for use with WebSockets. That text data is sent on to the browser where the client interprets that data and draws it on the screen with Canvas.

The entire process looks something like this (click for larger image):

Ericom has also introduced a version of AccessNow that works with VMware View. There's an added step that involves hosting the web client on a View server so that it can take advantage of the View Open Client, which handles authentication and desktop selection before handing the connection off to the AccessNow Server (remember, that's more of a service than a server). Ultimately, they view this as a way to expand endpoint support for VMware View to anything with an HTML5-compliant browser, which will level the playing field with Citrix when it comes to number of client devices supported by the platform.

At this point, AccessNow does not support virtual keyboards like what you would find on iOS or Android devices. It appears that only Spark View supports those types of devices today, although I haven't had a chance to actually look at the product yet. We know Walter reads this blog, though, so maybe he can comment :) Ericom has said that they are close to providing it, they just want to make sure they get it right before releasing the next version.

Since the Citrix HTML5 client hasn't been released yet, I'm not sure how it works. I imagine it has the same basic architecture, though, while utilizing some of Citrix's existing components (web interface, connection broker, NetScaler, etc...). It's my plan to do a HTML5 remote desktop client roundup when Citrix releases theirs, but if that winds up being too far out, I'll do it without them. It's all so new, though, it seems only fair to give it a little more time.

Based on his comment, Walter believes that WebSockets will be amended to include binary data at some point in the future, which may or may not eliminate the need for a gateway in the middle. There's not much doubt, though, that this article will be obsolete in the near future as more advances are made with HTML5 and remote desktop connections. Call it "Job Security,"  I guess :)