CDN, Embrace State or Die  

The dot com period was the wild wild west of the 90’s. Fortunes were made and lost. Startups exploded on the scene, only to go bust later. Out of the ashes, Google and Akamai were born.

Larry Ellison was in his prime, bad mouthing competitors, and Oracle was the #1 database in the market. Only three database choices existed for the dot com startup-Oracle, Microsoft SQL, and Informix. Startups with money bought Oracle, those with less money bought Microsoft or Informix. DB2 wasn’t a player.

Fast forward twenty years later and some things haven’t changed. Larry Ellison still bad mouths the competition, and Oracle is still the #1 database in the world. However, the database market is crowded. Today, there are 331 different database products in a highly segmented market, with plenty more being developed in garages around the world.

Yet, there isn’t enough database products to satisfy the specialized need of the cloud-native or microservices-based distributed application, because many of those existing database products are legacy. The good news-startups are introducing a new crop of distributed SQL databases and NoSQL products that are challenging AWS’s hold on the market. Not only that-they are challenging new database products like Google Spanner and Azure Cosmos DB.  

Product Gap

Look at the product roadmap for any CDN today and the most important feature is missing-the database. The database provides “state”. The CDN is faced with a stark choice-embrace state or die. In this case, state equals change. The ability to provide state in the CDN PoP in the next few years determines if the CDN survives or dies. It doesn’t have to be in every PoP, but can be in regions, as illustrated in the graphic below. The distributed database is the feature that should be on every CDN product roadmap. Period. And using AWS/GCP/Azure for that function is a non-starter. The database provides state to the otherwise stateless architecture of the CDN.

Serverless, FaaS, containers, and microservices at the edge provide minimal benefit if the requests must travel back to AWS/Azure/GC to fetch dynamic content stored in the database. There is only one CDN that provides some degree of state at the edge, Cloudflare. No one else.

The solution is to hack an open source database and make it multi-tenant, scalable, and consistent. It’s a challenging job, but no harder than building a globally distributed WAF. A few startups outside of the CDN industry have already succeeded in developing distributed database products (we’ll cover in the next post).

Providing State

Why is state so important to the CDN architecture? Because it gives life and meaning to applications running at the edge. Technically, it means the PoP (or region like APAC) becomes a self-contained unit that reduces the reliance on AWS. The biggest benefit of this feature-CDNs can capture some spend intended for AWS.

The distributed database will be the core building block that will turn a stateless CDN to a stateful CDN. After all, isn’t this the reason the entire world is on the edge bandwagon, to break AWS’s stranglehold on the cloud market. Once this is accomplished, data can be ingested, processed, stored, and delivered at the edge without AWS. For the CDN, that’s the dream.

Existential Threat

CDNs that are unable to provide state at the edge will die in a few years. This is the existential threat facing the CDN industry. Existential threat as in what Netflix did to Blockbuster, not Amazon to Barnes and Noble. It’s not all bad news. Those that embrace the distributed database and make a highly concerted effort to transform that database into a multi-tenant, highly-scalable product, with fast read/writes and strong consistency, will be rewarded with the opportunity to compete with AWS, in areas beyond content delivery.

The one key takeaway, AWS is the jack of all trades and master of none. The Cloudfront technology stack is inferior to the next gen CDN stack, DocumentDB is inferior to MongoDB, RDS and Aurora are inferior to CockroachDB, and so on. At least, that’s what some vendors claim and there is some truth to it. 

Akamai Conundrum

Akamai is missing out on the action for the second time in history. Akamai and Google started the same year, and the search company became a major cloud company, and Akamai became a major CDN. Akamai is repeating history by focusing solely on the security market. Security is a decent market, but the cloud (inc. database) is at another level. Take the annual revenue of the five of biggest security companies-Palo Alto Networks, Fortinet, Check Point, FireEye, and Symantec, sum up 2018 revenues, and it amounts to $11.7B. AWS booked double that revenue in 2018, $25B. Yes, it’s a weak comparison, but it gets the message across.

Following the Leader

Cloudflare is the only CDN in the world to have embraced state. It’s known as Workers KV. It’s not perfect, but it’s guaranteed to improve immensely over time. In case anyone is wondering, Workers and Workers KV are profound and monumental game changers. No other CDN feature in the industry comes close, and that’s counting the Akamai feature set. We predict that StackPath will be the next startup to develop and introduce a distributed database, as they’re right on the heels of Cloudflare. In 12-16 months, we predict there will be 3-5 more CDNs that have a basic version of the distributed database on the market. 

Database Market

Presently, there are two different types of distributed databases that are a good fit for the CDN business model, NewSQL and NoSQL. There are numerous open source products available under each category. NewSQL is a relational database that has ACID support, pretty good consistency, and supports a high volume of reads/writes. NoSQL is an umbrella term that includes four different types of databases: key-value, document, graph, and wide-column stores.

Some of the popular NewSQL products are Google Spanner, CockroachDB, Yugabyte, and NuoDB. Amazon Aurora is a relational database, however, since it is a “significant rewrite” of MySQL, it’s not at the same level as Google Spanner and the others mentioned. NewSQL is the more complex of the two to develop. For CDNs that haven’t raised a bunch of capital, going the NoSQL route is the better choice. In the next post, we’ll publish an analysis of the database market, exploring key concepts like multi-model, consensus algorithms, replication, quorums, time clocks, etc.