Netflix Building Content Delivery Network Data Science and Algorithm Team

Earlier this year, Netflix hit a milestone of 100 million users worldwide–a massive user base that now consumes 125 million hours of streaming content per day from their own CDN. Delivering content at this scale presents a significant challenge, one that is aided by Netflix’s rapidly growing data science and algorithm team. By leveraging big data on various metrics, such as subscriber churn and popular files in various regions, Netflix is able to maintain QoE and plan for future changes to their membership numbers, catalog, and consumer devices.

Netflix began building their data science team in 2014 to help gage the impact of QoE on user behavior. Two factors that tend to impact subscriber churn and viewing time are rebuffering and bitrate. However, as a higher bitrate can lead to increased rebuffering, data analysis and sophisticated mapping needed to be developed to determine the tradeoff that maximizes viewing time and viewer satisfaction.

Leveraging this information, Netflix’s algorithms determine in real time the bitrate and optimal server to download content from in order to ensure proper load balancing. These decisions are made at both the aggregate level and the individual level, as factors like QoE preferences, the quality of the network, the device used to stream content, and the location of the viewer can all have an impact.

Content delivery decisions, such as which files to cache in order to minimize origin traffic, can also be aided by data on which files are the most popular. In addition, viewing data helps to improve the technical quality of Netflix’s content, ensuring that audio, video, and subtitle files are properly aligned. To an extent, issues like these that escape the quality control process can be found through member feedback, but data is often required to interpret such feedback, as it is liable to be skewed by factors such as personal preference and network connection. However, this feedback can be supplemented by user behavior data–such as a sharp dropoff in viewers at a specific time in a file–in order to pinpoint file errors.  

The quality control process also leans heavily on data science, using predictive modelling and machine learning algorithms to both detect errors and more efficiently utilize manual quality control checks. In the quality control workflow, Netflix receives content from studios and fulfillment partners, then performs routine inspections to replace poor quality files. Auto inspection is then done to compress files at various bitrates and for various devices, and followed by a final manual check of either the full file or a spot check.

With predictive quality control, machine learning is used to reallocate manual quality checking resources to ensure manpower is spent on hard-to-find problems that might be missed during a minimal spot check. Netflix uses data on past errors to identify which assets are likely defective, such as older files or specific content suppliers. Files that are predicted to contain errors are then redirected for a full quality control check. Finally, offline validation of the machine learning algorithms is performed and compared to manual quality checks to finetune the machine learning model and validate its efficacy.

Currently, Netflix’s data science team is at work on improving content delivery by leveraging data on when, where, and how content is consumed in order to assist with caching, load balancing, and encoding. Netflix uses its own CDN called Open Connect in order to reduce costs, relieve congestion, and improve service quality for its members. In doing so, it must optimize content allocation among clusters of servers in order to ensure proper load balancing. Real-time algorithms are used to distribute content among clusters based on consistent hashing, a pseudo-random method that is stable and repeatable. However, any degree of randomness in file allocation can result in overloaded areas. In order to address this issue, Netflix is currently tailoring their algorithms to ensure that the methods are tailored to specific clusters for load balancing.

As Netflix’s global user base continues to grow, and as technology and consumption patterns continue to change, data is used to assist with long-term planning. For example, data on increased mobile use or a surge in 4K TVs may affect user expectations for bitrate. Plans for new servers must be made at least a year in advance to ensure they are operational, but with increasing amounts of user data and additional investments in their data science team, Netflix will be able to predict and plan for these changes well ahead of schedule.

Digiprove sealCopyright secured by Digiprove © 2017