Stateful CDN Architecture; The CDN Database

The industry’s quest to develop the next billion-dollar product is in play. The last product to transform the CDN industry was cloud security. Now it’s on to the next idea. Some believe edge compute is the next billion-dollar product. It is, but not for the CDN.

The CDN technology stack is comprised of caching, load balancing, WAF, and log processing, for all intended purposes, all of them stateless. Adding more stateless services to the stack such as functions (FaaS) won’t make a difference without state. The logic is simple. New stateless services like functions require AWS when state is needed. And whenever AWS is involved, money tends to follow them.

The real opportunity for the CDN is “state”. The CDN must develop a distributed database that provides state to functions and serverless. Since the database is a core product of AWS, among hundreds of other services, CDNs can gradually steal business away from AWS after a product is developed. Containers and microservices make applications more portable, therefore making it easier for the CDN to capture market share.

CDNs can’t compete with AWS in all services, but they sure can on specific products like the distributed database. AWS is a generalist, not best of breed. AWS Lambda, DocumentDB, Aurora, Cloudfront, AWS WAF, etc. are not feature-rich, usually lagging best of breed. In the database market, some startups have introduced more feature-rich database products that are faster, more consistent, and more scalable, globally. We’ll review these claims in the next post.

The open source database ecosystem is abundant in tools, research, database products, best practices, cheat sheets, etc., so CDNs don’t need to start from scratch. In this post and the next, we’ll review the general database market, and summarize academic research, database concepts, features, design principles, and open source products.

Why are we publishing this? Because some startups have asked for our opinion. The first response given, do not use AWS, GCP, or Azure. Instead, build your own product. It’s not as difficult as some portray. A lot of the groundwork has already been done.

Database Market

The global database market in 2018 was $34B, with open source databases representing $2.6B. There are 331+ databases products today and several more being developed in stealth mode. The market is highly segmented. Databases have been developed to cater to many different types of use cases. For example, IoT is the driving force behind instrumentation, a process where consumer devices generate time-series data. As a result, time-series databases have emerged, a niche created, and the market is growing rapidly.

Although the database market is crowded, many companies have built successful brands, in both distributed SQL (NewSQL) and NoSQL. Some of the more popular non-AWS brands are MongoDB, Redis, Cassandra, Couchbase, InfluxDB, FaunaDB, CockroachDB, and Yugabyte. Yet, there are some products under the radar that are very important foundational pieces, on which some of the newer databases just mentioned were built upon.

Database Types

There are three primary database categories in the market: RDBMS, NewSQL, and NoSQL. Under each category, there are dozens of brands, many of them open source. All these database products have been developed to cater to the many different use cases. No database product is best for all use cases, regardless of what vendors say.

Therefore, the CDN must choose wisely to the type of database it may want to develop. Much of that will depend on the existing customer base. If a CDN has many ecommerce clients, then NewSQL is probably the way to go. If they have media clients with unstructured data, then NoSQL is probably the way to go.

Database Categories

  • Traditional RDBMS (SQL)
  • Distributed SQL (NewSQL)
  • NoSQL (NoSQL)

Under each category, there are numerous popular database brands. The list below summarizes some of them.

Popular Brands

  • SQL: Oracle, Microsoft SQL, MySQL, AWS Aurora, Google Cloud SQL, and PostgreSQL (Object)
  • NewSQL: Google Spanner, Cockroach, FaunaDB, and NuoDB
  • NoSQL: MongoDB, Azure Cosmos, AWS DynamoDB, and Google Cloud Bigtable

The NoSQL category has four different types of databases: key-value, document, graph, and wide-column. There’s probably more out there, but this is a good representation for now.

NoSQL Brands

  • Key-value: Redis, Memcached, and DynamoDB
  • Document: MongoDB and Couchbase
  • Graph: Neo4j, Azure Cosmos DB, and OrientDB
  • Wide-column: Cassandra, HBase, and Azure Table Storage

Now here is a graph that includes the data above and is easier on the eyes.

In the next post, we’ll dive immediately into the technical aspects of the CDN database.