Last year, Facebook rolled out ‘Live for Facebook Mentions’, a feature that allows public figures to broadcast live video to their legions of Facebook fans via its internal CDN. The move, while popular, brought with it a host of vexing logistical issues, requiring Facebook to solve for huge traffic spikes and load balance in a scenario involving millions of fans clamoring to access the same live video.
One of the major problems Facebook faced in establishing a high-scale video distribution system was handling the stampede of requests that come with popular broadcasts. Such traffic spikes, dubbed the “thundering herd” problem, can cause lag, dropout, and disconnections from the stream.
Rather than let clients connect directly to the live stream server, Facebook set up a globally distributed network of global edge caches.
In this implementation, a live video is split into three-second HLS segments which are sequentially requested by the video player streaming the broadcast. That request is handled by an HTTP proxy in an edge data center that determines whether the segment is already in an edge cache. If it’s in there, the segment is returned directly from the edge cache. If not, the proxy sends out an HTTP request to the origin cache layer, nestled in the same CDN architecture, which returns the HTTP response with the segment.
On the way back, the segment is cached in each layer so that subsequent clients receive it faster. That way, more than 98% of segments are stored in an edge cache close to the user at the time of a request, reducing the pressure on the origin server.
That scheme still resulted in around 1.8% of requests reaching the origin server, which is sizeable when resolving a million+ requests. As a backup, Facebook used a technique called ‘request coalescing’, which, in the event of a thundering herd scenario, has the edge cache return a cache miss for the first request and hold the following ones in queue. Once the HTTP response comes back, the segment is stored in the edge cache and the requests in queue get cache hits. This reduces pressure on the origin cache.
The technique scales, so the origin cache runs the same mechanism to handle multiple edge caches sending requests.
On the other hand, when it comes to rolling out live video for non-public figures, the thundering herd is less of an issue and reducing latency becomes the name of the game. Everyday people are more likely to broadcast live video for a smaller social circle, meaning it’s more important to provide near real-time interactivity without data transmission delays.
To address this issue, Facebook dialed down the latency in live broadcasts to a couple of seconds by enabling RTMP playback, in the hopes that the measure would streamline the viewing experience.
RTMP is a streaming protocol that maintains a steady TCP connection between the video player and server for the entirety of the broadcast. Rather than HLS, RTMP runs a push model in which the server continuously sends two streams to the client– video and audio data. The streams are chunked into 4 KB pieces, which are interleaved. The chunks are only 64 ms long, which makes for a smoother streaming experience. As soon as 64 ms of video data has been encoded, the broadcaster sends it out and the transcoding server processes it and produces multiple output bit rates. The chunk is then sent through proxies until it reaches the player.