Cloudflare’s serverless cloud computing program is called Workers. Unlike most cloud computing programs, Workers is unique in that it doesn’t draw on containers or virtual machines to run. Instead, it deploys V8: Isolates. The team at Cloudflare hails this as “the future of serverless and cloud computing in general”. This post will look at what V8 is, why Cloudflare believes it represents the future, and why they are using it as the key ingredient for their serverless program.
What is V8 and what does it do?
What are V8 Isolates?
V8 Isolate represents an isolated instance of the V8 engine. Cloudflare describes Isolates as “lightweight contexts that group together variables with the code that mutates them”. Just one process can run large numbers of Isolates, switching between them seamlessly. Isolates enable the chance to run untrusted code from multiple customers within a single operating system process. V8 isolates have separate states. They are constructed to spin up extremely quickly and to not let one Isolate gain access to the memory of another.
Updates over the Last Decade
V8 has been continually improved and updated over the last decade of its life. Its performance benchmarks have gone up by four times over that period.
Some of the improvements include:
The Latest Version: V8 7.1
As of earlier this month, the beta version of V8 is available, V8 7.1.
In terms of memory, bytecodes for the interpreter are newly embedded into the binary, which represents a saving of around 200KB per isolate. In terms of boosting performance, the TurboFan compiler’s escape analysis has been enhanced to tackle local function contexts for higher order functions, when variables from the neighboring context escape to a local closure. In relation to escape analysis, scalar replacement is undertaken for objects local to an optimization unit.
Some key features are now enabled in webAssembly; including:
- for WebAssembly’s bytecode format, postMessage is supported for all modules, which is scoped to web workers, but not yet extended to cross-process scenarios
- An early preview of WebAssembly Threads, enabled by a feature flag: chrome://flags/#enable-webassembly-threads
How does Cloudflare use V8?
Cloudflare began to develop Workers when faced with a problem. They wanted their customers to be able to write code and build applications themselves as Cloudflare was limited in the number of features and options they could build internally. The goal was to find a solution that allowed customers to write code on Cloudflare’s servers deployed worldwide (then 100, now over 150). It had to run extremely quickly. Cloudflare processes millions and millions of requests per seconds, and sits in front of over ten million sites. They previously used Lua, but as it didn’t run in a sandbox, customers weren’t able to run code independently. To use traditional virtualization and container technologies such as Kubernetes would have been too expensive and too resource intensive. Eventually, Cloudflare settled on V8 Isolates, which are built to start very quickly. A single process can run hundreds or thousands of Isolates, switching between them seamlessly. This means that code can be run from many different customers in the same operating system process. They consume far less memory than other similar systems and they don’t use a virtual machine or a container, meaning that you running much closer to the metal than most other forms of cloud computing.
The Difference with Traditional Serverless
In a blog post on Cloudflare’s use of V8, Director of Product for Product Strategy, Zack Bloom says he believes “it’s possible with this model to get close to the economics of running code on bare metal, but in an entirely Serverless environment”. Bloom says this marks not just “iterative improvement but an actual paradigm shift”.
Traditional serverless platforms like Lambda work by spinning up a containerized process for code. Rather than running your code in a lightweight environment, instead it autoscales the containerized process, which creates cold-starts. A cold start is what occurs when you need to start a new copy of your code on a machine. In Lambda, this means spinning up a new containerized process can last between 500 milliseconds and 10 seconds. Requests that last up to ten seconds can lead to a bad user experience, and worse, as a Lambda is only able to process one request at a time, each time there is an additional concurrent request, a new Lambda needs to be cold-started. The user experience can worse as that long request is repeated over and over. Alternatively, if Lambda doesn’t get a request quickly enough, it will be shut down and the process will begin again. Whenever new code is deployed, every Lambda has to be redeployed also and the whole process happens again.
By contrast, Workers doesn’t need to start up a new process each time. Isolates begin in 5 milliseconds, which is almost imperceptible. Isolates scale and deploy as quickly as Lambda, eliminating the issue of cold-starts and laggy requests.
All operating systems allow you to run multiple processes at once. It will switch between the different processes that want to run code at a given time. It does this via a ‘context switch’, moving the memory needed for one process out, and the memory next required in. This can take as much as 100 microseconds. This creates a heavy overhead when multiplied by all the Node, Go or Python processes running on an average Lambda server, which means some of the CPUs power is spent switching between customer code rather than just running it.
The isolate-based system, by contrast, runs all of the code in one process and calls on its own mechanisms to maintain safe memory access. There are no expensive context switches then and the machine can spend most of its time running your code.
Memory and Cost in Multi-tenant Systems
A basic Node Lambda not running real code consumes 35MB of memory (it was built to run on a single server, not in a multi-tenant environment with strict memory needs); sharing the runtime between Isolates, however, means only around 3MB of memory is consumed. Memory is typically the highest code of running customer code (even more so than the CPU). Thus lowering it significantly in this way can dramatically impact the economics. V8 was built to be multi-tenant and was designed to run the code from all the tabs in your browser in isolated environments within a single process.
Lambdas are billed based on the length of time they run for. Billing is rounded up to the nearest 100 milliseconds, which can lead to overpaying for customers, particularly as you pay for the time it takes for an external request to complete which in a multi-tenant system at scale can be significant. As Isolates have a far smaller memory footprint, Cloudflare bills its customers for the time only when code is actually executing as opposed to running. Cloudflare claims that Workers can translate to being 3x cheaper per CPU-cycle. A Worker that offers 50 milliseconds of CPU is $0.50 per million requests. According to Cloudflare, the equivalent for Lambda would be $1.84 per million.
Running code simultaneously within the same process requires attention to security. For Cloudflare, building an isolation layer would have been far too expensive. In Bloom’s words, “The only reason this was possible at all is the open-source nature of V8, and its standing as perhaps the most well security tested piece of software on earth. We also have a few layers of security built on our end, including various protections against timing attacks, but V8 is the real wonder that makes this compute model possible”.
The Limitations of V8