Uber’s Machine Learning Projects – Manifold, Michelangelo and Horovod

January 22, 2019

A Brief History of Machine Learning at Uber

Uber has been working on machine learning (ML) for several years, first introducing Michelangelo, its centerpiece machine learning platform, in September 2017. The company describes its overall contribution to the space of AI and ML in terms of a goal “to cover the end-to-end ML workflow: manage data, train evaluate and deploy models, make predictions, and monitor predictions”, in addition to supporting traditional ML models, time series forecasting, and deep learning.

In 2015, Uber was not yet widely using ML. However, as Uber’s services have grown, Uber says there has been “an explosion of ML deployments across the company” with “hundreds of use cases” in play at any time, “representing thousands of models [that] are deployed in production on the platform”. The goal has been to make the “pervasive deployment” of ML a strategic priority.

Uber started its ML efforts small, focusing on the enabling of large-scale batch training and productionizing batch prediction jobs, then incrementally grew the system. Since 2015, Uber has added numerous components and integrations, including a centralized feature store, a low-latency prediction service that operates in real-time, model performance reports, notebook integrations, deep learning workflows and partitioned models.

An Active Contributor to ML Technologies

In three years, Uber has gone from having zero centralized efforts in the ML space and a handful of bespoke ML projects to having advanced ML tools and infrastructure, and hundreds of production ML-use cases.

In doing so, Uber has become one of the most active contributors to open source machine learning tools, technologies and best practices around. Building out from Michelangelo, the company’s other highly useable technologies include Horovod, PyML, Pyro and most recently, Manifold. There are only a handful of companies building such large scale ML efforts, and Uber is certainly both one of the most prolific and one of the most open about its work in the space.

Manifold: Uber’s New ML Project

One of the primary uses for ML at Uber is the support of smart decision-making and forecasting for features like ETA prediction and fraud detection. It typically applies the “80/20 rule” or the “20/80 split rule of ML model development” in which 20% of working effort involves the construction of initial working models and the other 80% involves improving model performance.

During the construction of models, data scientists will traditionally evaluate each potential candidate with the aid of summary scores in areas such as area under curve (AUC), log loss and mean absolute error (MAE). However, while these metrics provide an insight into how a model is performing, they are not particularly useful in conveying why a model is not performing well or how to improve its performance. This leads to model builders needing to largely rely upon old-fashioned trial and error when exploring ways to improve their models.

Manifold was developed with the goal of solving some of these issues and making the model iteration process “more informed and actionable”.

An In-House Visualization Tool

Manifold’s architects describe it as “Uber’s in-house model-agnostic visualization tool for ML performance diagnosis and model debugging”. By leveraging visual analytics techniques, Manifold enables a deeper look past summary metrics to the subsets of data, which a model is inaccurately predicting. In addition, Manifold surfaces the feature distribution difference between differently-performing subsets of data in order to help explain potential causes behind poor model performance. It also displays the way in which there can be several different prediction accuracies for each subset of data, offering justification for advanced solutions such as model ensembling.

The overall intention behind Manifold is to help make Uber’s ML models less opaque, more transparent and easier to understand so that users can deploy ML-generated predictions from the company “with confidence and trust”.

How Manifold Works

Uber flipped the standard ML model visualization on its head with Manifold. As opposed to inspecting models, Manifold inspects individual data points by: (i) identification of the data segments, which help a model perform well or badly, and exploring how this data affects performance between models; (ii) assessing the aggregate feature characteristics of the data segments to discern the causes behind different model behaviors. This approach helps enable model-agnosticism, which is useful in relation to identifying opportunities for model ensembling.

Manifold serves the majority of ML models, beginning with classification and regression models. It also offers visibility into the black box of ML model development by revealing feature distribution differences between data subsets.

Michelangelo

Michelangelo is the centerpiece of Uber’s machine learning stack. Michelangelo was built as a ML-as-a-Service platform for internal ML workloads at Uber. The ML-a-a-S automates different components of the ML models’ lifecycle by enabling different engineering teams to build, deploy, monitor and operate ML models at scale. In particular, Michelangelo abstracts the lifecycle of a ML model in a highly sophisticated workflow.

Michelangelo at Scale

The architecture behind Michelangelo draws on a modern yet complex stack based on various technologies, including HDFS, Spark, Samza, Cassandra, MLLib, XGBoost, and TensorFlow.

Uber has built its own proprietary toolset called the Data Science Workbench (DSW), which trains models over large GPU clusters and various machine learning toolkits. Uber has aimed to simplify the production and operations side of building and deploying ML systems, allowing ownership of work across the entire lifecycle. Training-wise, the ML tooling simplifies the data science behind the systems, making training sufficiently high-quality models easier without the need for a data scientist. However, there are also ML infrastructure components that allow detailed customization of configurations and workflows, popular with highly experienced engineers.

Uber has said that in order to successfully scale, it has had to get “more than just the technology right” and attributes its success in scaling to three “critical success factors across three pillars: organization, process, as well as technology”.

Organization involves ensuring the allocation of scarce expertise to the right projects at the right time in order to amplify their impact across several different ML problems. Another key finding for Uber was that it works well when the product engineering teams own the models they have built and will deploy in production. If they need assistance, they can get it from the research and/or specialist teams at the right time for them. There are also ML Platform teams, which focus on building and operating a general purpose ML workflow and toolset, which the product engineering teams use to build, deploy and operate ML solutions.

Process – Uber has found various processes along the way, which “have proven useful to the productivity and effectiveness of our teams”, including sharing best practices in areas such as deployment management, data organization methodology and putting more structured process into place, such as launch reviews, in order to guide teams and prevent the making of mistakes others have made.

Technology – Uber has crystallized its findings in the technology space down to four essential high-level areas:

End-to-end workflow: providing support for the entire ML workflow: managing data, training models, evaluating models, deploying models and issuing and monitoring predictions.
ML as software engineering: The drawing of analogies between ML development and software development, and then applying patterns from software development tools and methodologies back to Uber’s approach to ML.
Model developer velocity: Uber has found that “innovation and high-quality models come from lots and lots of experiments”. As a result of this, model developer velocity is critical.
Modularity and tiered architecture: Providing end-to-end workflows is important for handling the most common ML use cases, but for the less common cases, it’s essential to have primitive components that can be assembled in targeted ways.

Michelangelo Use Cases

Michelangelo powers hundreds of different ML scenarios across Uber, including:

Uber Eats:

Ranking Models, including restaurant and menu recommendations
Estimating meal arrival times based on predicted ETAs, real-time signals for the meal and restaurant and historical dates
Customer Support: Thousands of support tickets come up each day with queries about left luggage or other problems; ML learning models built in Michelangelo are used to automate and speed up response time and the resolution of issues
Spatiotemporal forecasting is used to predict revenue, production and spending, for hardware capacity planning and marketplace forecasting – built from a combination of machine learning, deep learning and probabilistic programming

Ride Share:

Estimated Times of Arrival (ETAs) – Uber’s Map Services team has constructed a segment-by-segment routing system, which is used to work out base ETA values (a notoriously tricky area to get right). The base values always have patterns of errors. The team realized they could use a ML model to predict these errors and use them to make a more accurate correction; since this model was rolled out city-by-city (then globally), there has been a dramatic increase in the ETA accuracy, in some instances, reducing average ETA error by over 50%

One-Click Chat – this streamlines communications between passenger and driver through the use of natural language processing (NLP) models, which predict and display the most frequent replies to in-app chat messages, helping drivers focus on the road with fewer distractions.

Horovod

Horovod is an open source part of Michelangelo’s deep learning toolkit and one of Uber’s most widely esteemed ML solutions.

Alex Sergeev, the Horovod Project Lead, led the building of Horovod while developing Uber’s in-house deep learning platform. Existing open source deep learning solutions weren’t meeting Uber’s needs for performance, usability and scale, so Sergeev and his team set out to build their own. They released it in September 2017 in order to help make other AI practitioners lives easier for training TensorFlow, Keras, and PyTorch models with just six lines of code. The goal is to make distributed deep learning fast and easy to use.

Sergeev says that to his knowledge, Horovod is “one of the few framework-agnostic solutions that scales other deep learning frameworks”. He describes their goal as being “to have the same infrastructure for any framework that we wanted to scale”.

Horovod builds upon Baidu’s draft implementation of the TensorFlow ring-allreduce algorithm. The framework uses Open MPI (or other MPI implementations) for message passing between nodes, and the Nvidia Collective Communications Library (NCCL) for its optimized version of ring-allreduce.

According to InfoWorld who named Horovod one of the Best Open Source Softwares of 2018, it achieves 90% scaling efficiency for Inception V3 and ResNet-101, and 68% scaling efficiency for VGG-16, using up to 512 Nvidia Pascal GPUs.

Use Cases

Horovod is used internally by the project engineering teams developing Uber’s self-driving car systems, which deploy deep learning models for a variety of things, including motion planning and object detection. Horovod is used by modelers for efficient distributed training of large models over many GPU machines.

Externally, many organizations have put the system to use from NVIDIA, the GPU inventor (to scale testing of its GPUs) to the Oak Ridge National Laboratory, the U.S. Department of Energy’s supercomputing-focused research institute. Across 2018, an increasing number of deep learning ecosystems integrated Horovod into their workflows, including AWS, Azure, Google and IBM Watson.