We have just finished our second interview with the CEO of Red5, a pioneer in live streaming. As part of our ongoing discussion with Red5, we’re going to dive into their auto-scaling feature, which automatically scales server instances to accommodate bursts in traffic. A big thanks to Chris Allen for the interview.
What is autoscaling?
Autoscaling is the ability to deploy servers in such a way that people can implement live streaming apps without worrying about or scaling them. This way, you can deploy your own CDN on any cloud network that you can think of. It takes care of spinning up and down instances depending on the traffic load, and it does it automatically.
The architecture leverages a Red5 server instance acting as what we call a stream manager. It takes care of routing traffic to various streaming endpoints. We use an edge origin model. So we have one server, an origin that acts as a publishing point and pushes out streams to edge instances. Edge instances then stream to clients wanting to view those live streams. And the stream manager manages the traffic between the clients and the server and also live monitors the edge and origin instances in the cluster. The stream manager tells different clients to go to the most optimum server–we utilize a REST API to do that kind of traffic management.
Our servers cluster together on a peer-to-peer model on the back end of a cloud network. We can deploy on AWS or Google Compute Engine, and we have a specific implementation for each vendor API. We have plans to make those implementations public soon so that people can customize them by writing their own cloudwatch for other cloud platforms. We’ve recently shared our cloud management code with a large company whose infrastructure is on Microsoft Azure, so we will support Azure soon too.
How does edge origin streaming work?
The origin streams content to the edge, and subscribers will make requests to the stream manager, which sends them to the right edge to connect. The stream manager is monitoring the edge components to determine which one is right based on capacity. It can also do geolocation to connect to the closest edge based on the IP address.
Does Autoscaling also mean high availability?
With autoscaling, you can configure your cluster to have different origins. The stream manager takes care of routing the broadcasters to the origin servers. If one goes down, then it’ll take care of spinning up another to replace it, because the stream manager is doing all the monitoring of the system.
Do you have client using this setup?
Yes. We have companies that are deployed across multiple regions on AWS. For example, we have a reality TV show deploying across Central Europe, the Eastern United States, and India that is able to use those three data centers. The celebrities for the show are based in the Middle East and broadcasting themselves. So they want viewers to connect to the edges that are most optimal based on geolocation.
And how does the stream manager work?
It has a vendor-specific or cloud-vendor-specific model for spinning up and down instances. We’re going to release this piece as open source so people can write their own controllers for different cloud networks. Right now we ship with AWS, Google Compute Engine, and we’re adding Microsoft Azure in the first quarter of next year. That’s the cloud controller piece. Also the load-balancing feature takes care of load balancing the edge and origin depending on how many people are broadcasting or subscribing to the stream. It takes care of monitoring the edge and origin, monitoring the load, and adding more or spinning down if needed.
We also have an API that gives you insight into what’s going on with different nodes, so you can see how many viewers you have at a time, how many people are broadcasting, where they’re coming from. This way you can do manual overrides with the API, like in order to add origins in anticipation of a spike for event. Those are premium components: load balancing, API, and cloud-specific spinning up or down.
Is Stream Manager like AWS Lambda?
Yes, plus more. And all the are updates provided over REST or WebSockets, but it’s obviously focused and optimized for low latency streaming using Red5 Pro.
How many edges or origins are needed to support 20,000 or 50,000 viewers?
As a safe number, we use 2000 concurrent streams (per instance) as a good ballpark figure, but it depends on the bandwidth and quality of streams.
Basically each company’s needs are going to vary, so they set their own thresholds. Some people set a lower number so they can spin new instances once the edges hit capacity. The new instance will take some time to start up, maybe a minute. So during that spike, the company will still want to have capacity on the other edges to use before they start routing traffic to that one. It’s good to keep a buffer in place. Sometimes 2000 is a sufficient number for a buffer. Other times, companies will set a much lower number to handle a vast spike and still have capacity while it reacts.
Then we have other use cases, such as a company building a product for home security. All cameras are on most of the time and triggered by certain events. But because a few people at a time are monitoring on their own phones, the company needs to set up many origins and only a few edge instances.
How is it determined how many instances are needed?
Right now, we like to be very hands-on with customers to help them pick what’s best for their needs. It’s also complex in the sense that there are a lot of options and it’s all very flexible. Companies can set minimal thresholds to spin down, maximum thresholds for adding more, or they can add in geolocation stuff, which adds another level of complexity.
Also, how each individual app is used is another consideration because some apps have giant spikes quickly, like with our reality TV client, while others are more steady. Facebook Live would be an example of a steady stream with a bunch of people publishing and only a few people watching the streams.
You also want to consider what business needs there are for the app. For example, we have an auction company that allows auction houses to stream out to bidders. They need to support a capacity of one million concurrent viewers, but they also need a sub-300ms latency to remain in sync with the data. If someone is live bidding on products, latency is problem. You wouldn’t want to put in a bid on one item and then find out that by the time your bid registered, you had accidentally bought something else, all because the video was delayed by a second or two. Data synchronization and low latency are key for these guys.
What are difficulties in providing sub-300 ms latency to Auction Houses?
It’s hard stuff and there are a lot of factors going into it. The code has to be efficient and you have to be able to do routing quickly. You need to be able to change the container format on the fly to push the stream through for various protocols and endpoints. When dealing with this kind of load, with multiple formats and people broadcasting in different protocols, such as a WebRTC broadcaster and a subscriber with a Flash player, you have to use base protocols and transcode as little as possible. We tend to like h.264; all protocols can read it and has wide hardware encoding/decoding support. This way of doing it is extremely efficient.