Fastly and Netflix Introduce 100G Cache Servers

Categories

At the start of August, Fastly announced that it had added a collection of new points of presence (PoPs) to its global network. They now have 40 points of presence globally, most recently adding a new state-of-the-art facility in Rio de Janeiro, Brazil. The company has also expanded its US network to boost Internet productivity; most recently by designing an innovative modular PoP based on 100G Ethernet technology; technology that is also being leveraged at its new Rio location.

Large Internet traffic spikes are everyday occurrences – whether a result of an article going viral or a deliberate attack on a site – and 100G cache servers will help content delivery networks (CDNs) like Fastly better handle them.

Codenamed “Project Doughnut”, Fastly’s new PoP designs move from 10G to 25G Ethernet connections to the cache server, allowing Fastly to scale from 40G per cache to 100G per cache. Its request processing capacity and additional network capacity is also increased because of a higher number of port-dense network switches; thus Fastly has recently doubled its count of cache servers located in any one Fastly PoP.

Its international network now exceeds 15 Tbps of connected Internet capacity, and is fast growing. The implementation of 100G Ethernet, coupled with port-dense 5 Tbps switching platforms, enables the rapid scaling of its network edge.

Netflix’s global CDN, Open Connect, also recently decided to take advantage of the new 100GbE network interface technology to serve 100G Cache Servers. Previously, like Fastly, the majority of Netflix Open Connect’s storage-based appliances were limited to 40 Gbps using single-socket Xeon E5–2697v2.

Netflix Open Connect already has thousands of its own, custom-build caching servers in data centers around the world, which means that international consumers can stream their favorite TV show either from a caching server that exists in the same network or which serves the same region, from a cache further away, or directly from servers that Netflix rents from Amazon’s cloud service. The intent is to minimize long-distance network traffic, and distribute files as fast as possible.

Additionally, Open Connect uses special algorithms to determine what will be popular where, and proactively cache so that its consumers don’t have to deal with buffering delays.

The 100G cache servers enable even faster delivery of its content; the process towards this goal, which began over two years ago, was recently detailed in a Netflix Technology blog by Drew Gallatin.

The first step towards serving 100Gbps from an Open Connect Appliance started with what Netflix calls “Fake NUMA” to avoid disk bandwidth limitations that cause temporary disk bottlenecks. This involved “lying” to the system by pretending it had one fake NUMA domain for every 2 central processing units (CPUs). This improved performance to 52 GBps with significant CPU idle time.

After increasing to 60 Gbps on new hardware (Intel Xeon E5 2697v3 CPU, PCIe Gen3 x16 slots for 100GbE NIC, and increased disk storage (4 NVMe or 44 SATA SSD drives), the team hit a new bottleneck, similarly related to a lock on a global list.

After continued experimentation, Netflix has now arrived at the point of being able to serve 100% Transport Layer Security (TLS) traffic at 90Gbps using a default FreeBSD TCP stack. The team at Open Connect are pursuing various new ideas to further save memory bandwidth, including improving the efficiency of LRO and optimizing the new TCP code. Watch this space.

Scroll to Top