With Google’s release of Project Stream this month, there has been a lot of recent interest in low latency game streaming for cloud gaming. Google’s first entry into the arena is Ubisoft’s Assassin’s Creed Odyssey, with the kicker that it runs from within Chrome! This kind of makes sense as a distribution channel for Google over a native application given Google’s Chrome/Chromebook empire.
We’ve been tinkering with a browser client for Parsec for a while now, and have noticed Chrome’s performance steadily improving, particularly regarding networking and media playback (via WebRTC and Media Source Extensions). While doing game streaming in the browser has some current disadvantages compared to our native applications, it has been nicely playable for every game we’ve tried, so we decided to make our work public and share what we’ve learned in this blog post.
The repo is available on GitHub and is written in lean vanilla ES2018 for simplicity — I am obligated to mention that there was a strong internal lobby at Parsec for TypeScript, so we may convert it at some point in the future…
If you’re interested in learning more about other Parsec tech, take a look at our blog post from 2016 to learn a bit more.
Here is the general progression with the entry point in
- Make the peer-to-peer connections via WebRTC
- Send the client config to the Parsec host, initialize the stream
- Start receiving video, audio, and control messages
- Punt video to a
<video>element via Media Source Extensions
- Decode audio via Opus and play via Web Audio API
- Collect input/gamepad events, pack them in a binary format, and send to the host
[input.js, gamepad.js, msg.js]
- Have fun!
The rest of the post will break out each of these steps separately with more detail — we had to sacrifice depth for breadth in this post, but if you’re interested in more detail please let us know in the comments!
When it comes to performant peer-to-peer networking in the browser, there is really only one show in town: WebRTC. While it is possible to make a peer connection via
WebSockets, they have many disadvantages compared to WebRTC, particularly when it comes to NAT traversal and their TCP based congestion control. Our web client implementation uses
RTCDataChannels to communicate with the Parsec host, which allow for arbitrary messages to be sent via a peer-to-peer connection. Under the hood, all an
RTCDataChannel is is UDP wrapped in an SCTP stream wrapped in DTLS for security. While streaming, visit
chrome://webrtc-internals to see the data channels in action.
I think anyone’s first reaction when looking at WebRTC is that it is unnecessarily complex for what it does — it’s true that there is some unnecessary boilerplate in SDP and probably too much wrapping going on (4 handshakes!?), but considering what it’s doing is asynchronous, error prone, and incredibly complex I’m willing to cut it some slack 😃.
There is a lot of complexity regarding NAT traversal and first making the peer-to-peer connection (which at the end of the day boils down to a simple STUN ping/pong as part of the UDP hole punching procedure). This initial handshake requires the upfront exchange of security credentials, which is performed via signalling through a websocket
Parsec’s native clients use our BUD protocol, which is a custom UDP protocol we’ve built specifically for high throughput, zero-buffered streaming over the past few years. BUD additionally uses a few more native “tricks” during NAT traversal, i.e. more aggressive hole punching, optional client side UPnP, etc. We considered putting the data channel in unreliable mode and shipping a Web Assembly compiled BUD implementation as part of the web client, but for the sake of openness we decided to leave the connection the default DTLS/SCTP. In the future we expect to further integrate BUD into the web client to make the networking more robust in less-than-ideal conditions.
On the Parsec host side of things, rather than building and deploying WebRTC proper as a dependency with our app (it’s a behemoth), we did the work to make BUD’s current NAT traversal “ICE compatible”. We’re sort of OCD when it comes to the cleanliness of the binary, and like keeping it lean and mean:
There is so much to talk about regarding peer-to-peer networking that it is easily its own lengthy blog post, so we’ll leave it at that for now. Most of the strategy is made clear by looking at
[signal.js]. If anyone is interested in a deeper “dismantling” of WebRTC (the “why” rather than the “what”), please let us know in the comments!
The video comes through via its own data channel. While the Parsec native applications handle the decode/render pipeline manually to ensure hardware support and no added latency, the closest equivalent in Chrome is punting the video frames to an HTML
<video> element via Media Source Extensions.
The current implementation only works in Chrome. Before you assume we’re kool aid drinking Google evangelists, or simply just lazy, there is good reason for this — Chrome supports a special “low delay” mode for MSE that sets up a push model for video frames rather than the traditional buffered pull model. This is also good for any kind of low latency video stream, not just game streaming. When in low delay mode, Chrome begins to break the rules of MSE and no longer requires buffered playback. It also starts to ignore certain timing information and keyframe requirements.
EDIT: Google notified us that MSE “low delay” and its low latency performance is not related to Project Stream — we apologize for the speculation in the original draft.
Low delay mode can be observed when looking at
chrome://media-internals while streaming:
This is not to say that one couldn’t get a decent working implementation in Firefox, but Chrome’s low delay mode works in the ideal way without having to complicate the implementation or diverge too heavily from the way Parsec’s native applications behave. And while we love Firefox, the harsh reality is that 82% of Parsec users are using Chrome, with only 5% using Firefox, which made us more comfortable starting with a Chrome only implementation. For Firefox users, you can always use the Parsec native applications, which will probably perform better anyway 😐.
The heavy lifting in prepping and timing the frames in boxed MP4 is performed on the Parsec host, and it’s as simple as dropping the messages directly into MSE via
appendBuffer. Some care is taken to prevent MSE from complaining, but that’s really all there is to it.
The video is halted when the browser tab loses visibility to save bandwidth/power, but then is reinitialized when visibility is regained.
The audio comes through via its own data channel in 20ms samples at a 48KHz sample rate. It currently comes in as raw encoded Opus and is decoded via the Opus library compiled via Web Assembly
[wasm/opus]. The audio is then played via the Web Audio API, with care taken to ensure proper timing and prevent overbuffering.
This is one area of the implementation that we will be moving away from shortly. Chrome is adding support for mp4/opus via MSE in Chrome 70 which is a better solution, so we plan to shift the strategy once the time is right. It will work similarly to how the video is pushed, except to an
Input / Messages
Input (keyboard, mouse, gamepad) and arbitrary messaging (cursor, chat) are performed via their own data channel. The mouse/keyboard input is collected via the usual suspects in browser (
keydown, etc.), while the gamepad input is collected via the Gamepad API. The notable difference between gamepad input and mouse/keyboard input is that mouse and keyboard events are fired via listeners, while the gamepad needs to poll.
Each message is packed in a binary format that makes sense to the Parsec host. The cursor message that gets sent from the host to the client carries extra information over the usual messages.
When playing a game that shifts in and out of relative mouse mode (Pointer Lock in browser parlance), it is essential that the state is accurately reflected in the client. The Parsec host performs this detection and sends cursor messages to the client whenever there is a cursor state change of any kind. This way the cursor image is always up to date, and the client can seamlessly shift in and out of relative mode. Without a strategy to handle this behavior certain games (especially MMOs) are unplayable. The browser is somewhat limited here in that it places restrictions on when you can enter Pointer Lock (you wouldn’t want some random clickbait grabbing your cursor and not letting go 😠), but when entering fullscreen via
webkitRequestFullscreen the limitations go away.
The way we go about things is by no means the only way to approach this issue, but since cursor latency is especially noticeable, we’ve opted for giving the cursor a snappy local feel so you are not constantly reminded you are streaming your game remotely.
The web client showcases some interesting recent developments in browser tech, while providing a convenient way to get involved with Parsec without having to download anything. That being said, we still recommend for the best experience you use the Parsec native apps, which have had years of optimization for each platform (Windows, macOS, Linux, Raspberry Pi, and Android) and can take advantage of our game streaming optimized networking protocol BUD.
The web client is totally embeddable, which means you can essentially build your own Parsec client around it or integrate it within your website. It also opens the door to certain automation tasks by exposing/documenting the network interface with the host. Hopefully interesting things happen!
If you’d like to use the web client within the Parsec app, it is available under the Experimental section in Connection Settings. Any feedback is greatly appreciated, and of course hit up the GitHub repo with issues (or PRs!) if you want to get involved with the code.
And as always, hit us up on Discord with questions, we’re always hanging around.