Netflix Using Machine Learning and Predictive Caching To Improve QoS

Categories

In a recent Netflix blog by Chaitanya Ekanadham, Manager, Analytics and Data Science, Ekanadham explained some of the technical challenges the company faces and why machine learning and statistical models are helping them to overcome these challenges.

Netflix currently has over 117M members around the world. Over half of those members already live outside the U.S. and international business represents a major growth opportunity. One of the biggest technical challenges that Netflix currently faces is how to provide a quality streaming service for this global audience. The engineering effort necessary to installing and maintaining servers across the world is part of the challenge, as are developing algorithms for streaming content from those servers to their subscribers. Netflix is also continually pushing to cater to audiences watching its services beyond the desktop or smart TV. A “one size fits all” solution is becoming “increasingly suboptimal”, Ekanadham explained.

Subscribers both in the U.S. and internationally are streaming video on a variety of networks and devices, which represent a number of challenges, including:

  • Viewing/browsing behavior on mobile devices is markedly different than on smart TVs;
  • Different devices have different levels of Internet connection due to hardware differences;
  • Cellular networks are often more unstable than fixed broadband networks;
  • Networks in particular markets can have significantly higher amounts of congestion.

As Netflix seeks to adapt its methods to these fluctuating conditions to both continue to offer a high-quality service to existing subscribers as well as expand its pool of new customers, the company is finding statistical modelling and machine learning to be particularly useful in terms of observing and monitoring differing experiences on different networks and devices. By understanding more precisely the reasons for differing network quality, such as stability and predictability, Netflix is better able to target and analyse where product improvements need to be made. Machine learning, for instance, allows the company to capture network throughput in real time, which helps Netflix’s developers work out what can be done to help improve stability.

Another way in which machine learning is proving useful is related to adaptive streaming algorithms, responsible for adapting which video quality is streamed throughout playback based on the current network and device conditions. Algorithms can adapt the video playback system to optimize the quality of the stream. Machine learning techniques, such as recent developments in reinforcement learning, are helping the company make important choices about what trade-offs to make yet still maximize quality of experience.

Predictive caching is another area in which statistical models are helping improve the streaming experience. Predictive caching works by anticipating what a user will play next (e.g. if a user watches one episode of a series, they are likely to watch the next one) in order to cache it on their device before they hit play, meaning the video starts sooner and/or plays at a higher quality. Predictive caching has been proven to allow Netflix to “maximize the model’s likelihood of caching what the user actually ended up playing, while respecting constraints around resource usage coming from the cache size and available bandwidth”.

Finally, device anomaly detection is another area in which statistical modelling and machine learning plays a crucial role. As new devices enter the Netflix ecosystem, and/or updates are made to firmware, there can be problems with the user experience. Plus device quality often degrades over time. Detecting these changes and how they affect streaming is manually intensive and difficult. Ekanadham explained that it is hard to always determine the right criteria for labelling something a problem. Netflix’s research team has worked on these problems in-depth and created a history of alerts of whether or not certain issues were actionable or not, which Netflix is now using to train a model “that can predict the likelihood that a given set of measured conditions constitutes a real problem”. Netflix is already seeing  large reductions in overall alert volume, driving efficiency gains for its device reliability team.

Scroll to Top