Facebook’s Billion-User Load Balancing

Categories

As part of our ongoing series of videos that highlight challenges in distributed computing, we’d like to summarize the data points from Patrick Shuff, an Production Engineer on Facebook’s Traffic Team, on how they scale their load balancing infrastructure to support Facebook’s billion plus base of daily users.

Essentially, each of Facebook’s data centers consist of multiple clusters that are each comprised of tens of thousands of Hip-Hop Virtual Machines (HHVMs), hundreds of Layer 7 load balancers, and tens of Layer 4 open-source TCP load balancers. For all dynamic requests between users and HHVMs, Facebook uses ECMP routing to consistently hash the requests to the Layer 4 load balancers. These in turn consistently hash to Layer 7 load balancers, which proxy requests to HHVMs. In addition to Facebook’s data centers, which are currently located across North America and in Sweden, they have PoPs located globally, which are set up similarly to data centers, but without the presence of HHVMs. Considerations such as the closest edge to a user, each server’s health and capacity, and the user’s geolocation all factor into where to send requests, which is determined by a system they’ve dubbed Cartographer.

Below, we’ve summarized the main points of Patrick’s discussion to the minute, outlining the specifics of Layer 4 and Layer 7 load balancing, how hashing is implemented, and how servers and Edge PoPs are deployed globally to reduce latency.

4:20 – Provides Facebook’s scale, as of September 2016, includes 1.79b total users, 1.18b daily users, and 1.09b daily mobile users.
4:50 – Although most of Facebook’s data centers are located in the United States, 84.9% of their daily active users reside outside of North America.
7:00 – Shows Facebook’s weekly and daily usage cycle, which consistently peaks in the morning, with another peak in the evening for all time zones.
8:45 – Reviews the OSI model table for standard communication between devices. The three layers discussed will be Layer 3 (Internet), Layer 4 (TCP), and Layer 7 (HTTP), and the presentation will deal with load balancing dynamic requests between users and HHVMs.
10:27 – Provides a visual for how a request is put together over the three layers as it travels from its source to destination.
10:56 – Begins with a single HHVM server, serving traffic for .09b users. A user request would travel to and from DNS, then to HHVM to establish the TCP connection and SSL handshake and return the request to the user. This is okay for a small setting, but Facebook needs to serve considerably more requests per second (RPS).
11:36 – A Layer 7 load balancer (L7LB) is added to increase RPS. The L7LB is used to terminate a TCP SSL connection and proxy the L7 request back to one of several HHVM servers.
12:50 – A Layer 4 load balancer (L4LB) is added for more RPS. Each packet that comes into the L4LB is consistently hashed to one of several L7LBs (this maintains the TCP connection).
13:58 – A router is used for load balancing to add more RPS. The routers use ECMP to send requests to consistently hash requests to multiple L4LBs, which consistently hash to L7LBs, which proxy requests to HHVM servers. This system, comprising of a router using ECMP, L4LBs, L7LBs, and HHVMs, is a front-end web cluster.
15:17 – The router creates a single point of failure, and another cluster is added to establish redundancy. Multiple clusters are gathered into data centers, and additional data centers are added to further increase capacity and diversity.
16:15 – Rather than a top-down approach, Facebook uses a flat infrastructure that does not use specialized servers to execute special functions. This allows them to schedule out jobs to various systems and makes it easy to scale up or down based on capacity needs.
18:30 – Explains how L4 and L7 load balancing works. A BGP (Border Gateway Protocol) connection is used to establish communication between the L3 router and the L4LB and determine which routes are available.
19:38 – Traffic flows into the L4LB, where a service discovery framework has been built on top of Zookeeper. Zookeeper keeps a map of all L7LBs which determines which L7LB to hash requests to.
20:35 – Explains how hashing works. When a request comes in to the L4LB, it does a consistent hash on the four tuple (source IP, source port, source destination, and destination port) to determine which L7 to send the request to.
21:29 – Each L4LB keeps a state table of where it sends connections, so that if one L4LB is lost, the request can go to another L4LB and be sent back to the same L7LB to preserve TCP.
23:00 – If a L7LB connection is lost, TCP is broken, and the request must be hashed to another L7LB. When the L7LB’s connection comes back, however, the state table is used to determine where the request went and ensure that it does not break TCP again by rerouting back to its previous L7LB.
23:50 – Explains Direct Server Return (DSR). When responses are returned from HHVM, it is sent to the L7LB, which cuts the L4LB out of the loop and sends the request directly back to the user. This works through a process called IP in IP encapsulation, where L4LBs wrap an IP packet around the request as it is hashed to the L7LB. The L7LB decapsulates the packet, sees the original request, processes it, and sends it back to the user.
27:51 – Discusses Edge PoPs, with an example of sending information between South Korea and Oregon. Because of the multiple round trips required for each request, the total time for each communication is 600 ms. However, adding a PoP in Japan decreased the total time to 240 ms.
29:59 – Facebook deploys Edge PoPs around the world to improve global user experience. Load balancing in each PoP works exactly as in the data centers, but with no HHVM servers. Instead, all TCPs are terminated at the PoPs and sent back to data centers to process.
32:00 – Considerations for where to send requests include information such as the closest edge to a user, the capacity of the servers, the health of the servers, and the user’s geodata.
34:35 – Cartographer is the system that takes this information and makes real-time decisions on where to allocate requests. It creates a map of the world that is frequently updated with information on anomalies and capacity restraints.