When most people hear the term “networking,” they think of cables, routers and switches. Every CDN out there does networking, building out their infrastructure to build out their network. But with Fastly, they do a different kind of networking, and it all involves software.
The Fastly Architecture
At its start, the Fastly founders began by investing their money in commodity hardware, but that made it so they couldn’t afford the boxes. Because of this, they were stuck and needed a way to perform all the basic network functions, which is how they developed their software. So they accidentally ended up in a nice place.
For traditional CDNs, networking is infrastructure: purely, something you buy as a sunk cost with a fixed feature set that depreciates over time. But for Fastly, the network is the application: a set of functions that you can expose to services and eventually serve to the customer.
Cutting Through Marketing Fluff
To market to customers, a lot of CDNs will try to sell you “features” that aren’t really features at all. First off, anytime CDNs discuss networking, it is always in the form of a map. To them, networking means having a global presence, which is often because they have fixed networks that cannot be changed or adapted like Fastly’s software. So for you, the customer, they prefer to sell you on their network based on reach.
Another feature that CDNs often try to use for marketing purposes is the number of PoPs. Yes, as more PoPs are added the RTT gets lower, which is a feature, but after a certain number it will not affect you or your platform. There is a limit to how close to end users you can get, where adding more data centers does nothing for your specific needs. Once you get to the optimal level, it’s better to focus on other features that will directly help or hinder your performance.
Bandwidth is another buzzword that is often flaunted as a feature, but again it is just a number. Given all the other factors such as traffic flow, capacity, economies of scale, and so on, you get to a point where it doesn’t matter how much more bandwidth you can add onto your network, it will perform at a similar state.
The same goes for latency; after a certain point, a change in the perceived latency isn’t going to have any notable effects on the system, even though CDNs market it to you as a significant indicator of network performance. A lot of other optimization factors and features carry more importance when it comes to performance, and below we’ll address three key aspects of a CDN, and how the Fastly network manages them.
One main issue that all CDNs face, is the fact that the internet is just constantly throwing you packets at a PoP, which is all just routers with servers underneath. In order to manage this traffic, you need a load balancer. What load balancing does is efficiently and evenly spread traffic. You also have the option for DNS load balancing, where you simply put the IPs in and it will round robin naturally. But with traditional CDNs, doing PoP level and host level balancing in same infrastructure gets really complicated.
In order to perform, load balancers require state, and increased state means more cost. But without a load balancer to protect your network, any increased traffic peaks will cause the system to fail. So there’s a definite balance that needs to be achieved to protect your network, while saving on cost.
In order to help combat some load balancing issues some CDNs apply Equal Cost Multi Path (ECMP) routing. With ECMP the routing table has multiple entries to destination networks, and each entry sends you to a different PoP, meaning it’s stateless and, therefore, cheaper. If you take out a server, though, the routing table will change, causing a connection reset.
Because of this flaw, Fastly developed faild, which builds on ECMP by inserting ECMP entries into routing tables. What this does is create fake next HoPs that encode information, which tells the servers where they should route to next. Once this is applied to their infrastructure, they can more easily control servers and manage any failures across the board.
The best thing about faild is how it offers the best form of tradeoff between ECMP and load balancing. With ECMP once you drain it your active flows are killed and gone for good. With a load balancer, you save all the flows, but have to incur a constant amount of state. Faild mediates this with what you’re willing to accept.
Routing is often viewed as something simple–you build a PoP, buy a router, get BGP table from each provider, install routers to FIB, and the servers use default gateway out. But Fastly went in another direction, building their PoPs using switches, then reflecting BGP down to servers and injecting multipath routes into FIB.
The limitation of using a switch is that it doesn’t have enough FIB space to inject routes into, but it is capable of receiving routes that it can then send to the servers. What Fastly did was implement a distributed system that spreads around network configuration, allowing the servers to offload traffic into separate providers individually. Routers, on the other hand, don’t do this and will only give you a single path.
Routers always choose a best path, but that depends on what kind of traffic you have. A video might benefit from a different path, not just best RTT, which is something to keep in mind when choosing or building your CDN.
When choosing a CDN, it is also important to keep security in mind. The nature of DDoS attacks involves an attacker who has more bandwidth than you, sending tons of packets your way to crash your servers.
In order to help combat these attackers, there are BGP communities out there that tag routes, which you can use to tell your provider to withdraw routes that have been compromised. The attacker may control the edge route, but not necessarily the routing infrastructure.
This is another one of those instances where CDN marketing can get you in trouble. For instance, when it comes to security, bandwidth is meaningless without context. If it’s spread out across the world, then it doesn’t help you with certain issues, especially DDoS attacks
If your CDN can’t do things like arbitrarly route injection, remote triggered black holes, policy updates and so on, your network becomes much more vulnerable. Adding on a lack of consistency, a non-synchronous system and no global connections, you become seriously compromised.
Fastly, has logically centralized policy, distributed system from which all routing policy emanates. Operators only interact with one policy, and when that is shifted it pushes everywhere at the same time, which helps them to better protect themselves across all fronts.
The fundamental theme of the Fastly network proves to be resource pooling–taking a collection of resources and making them act as one.
- Load Balancing: pool of servers
- Routing : pool of providers
- Mitigation: pool of PoPs
For instance, when you contact Fastly PoPs, it looks like an IP, but that is pooling from a number of PoPs. The same happens with routing when they mirror down providers to the host like they are one link. Same with mitigation, take something that works in isolation and use it together against a common threat.
And while this works for Fastly, no one else can really do this from scratch, because once you go down a path it’s too hard and expensive to do anything else.
What traditional CDNs have been offering is a general purpose service, but with single use architecture. To them, networking was only meant to be a sunk cost and if you ever need to change anything, things get expensive and complicated. With Fastly, though, you can redefine your network based on your specific needs, because when it comes down to it–it’s all just software.