Stateful vs. Stateless Architecture Overview

August 21, 2018

Stateful vs. Stateless – An Overview

The key difference between stateful and stateless applications is that stateless applications don’t “store” data whereas stateful applications require backing storage. Stateful applications like the Cassandra, MongoDB and mySQL databases all require some type of persistent storage that will survive service restarts.

Keeping state is critical to running a stateful application whereas any data that flows via a stateless service is typically transitory and the state is stored only in a separate back-end service like a database. Any associated storage is typically ephemeral. If the container restarts for instance, anything stored is lost. As organizations adopt containers, they tend to begin with stateless containers as they are more easily adapted to this new type of architecture and better separated from their monolithic application codebase, thus they are more amenable to independent scaling.

Containerization: In Summary

The containerization of applications has become widely popular in recent years as microservices and cloud computing have likewise exploded in popularity. Many tech companies, from startups to large enterprise, are experimenting with containerization. Containers (also known as partitions, virtualization engines [VEs] or jails typically look like standard computers i.e. programs run inside them. However, compared to a computer running on an ordinary operating system, programs running inside a container are only able to see the container’s contents and devices assigned to that specific container.

Resource management features are often set to limit the reach of one container’s activities on other containers. Multiple containers can be built on each operating system, and a section of the computer’s resources is allocated to each different container. Each container can contain any number of computer programs, which many run at the same time or separately, and/or interact with one another.

Uses for Containers

Containers are typically straightforward and quick to deploy and make effective use of system resources. Container technology, such as Docker Engine, provide standards based packaging and runtime management of an application’s underlying components. Developers can achieve application portability and programmable image management using containers; operations benefits from being able to use standard runtime units of deployment and management. Increasing numbers of companies are investing in container technology. A 2017 Annual Container Adoption Survey showed that container technologies license and usage fees are increasing.

Barriers to Widespread Adoption of Containers

There are three particular challenges to the widespread adoption of containers:

It is difficult to overcome persistent application aware storage.
Application lifecycle management must be maintained long after the first day of deployment.
Multi-cloud and hybrid cloud support is a necessity

Types of Application

Stateless

Stateless applications have just one function or service, such as an IoT device;
They use web, print or CDN servers;
The server processes requests based only on information relayed with each request and doesn’t rely on information from earlier requests – this means that the server doesn’t need to hold onto state information between requests;
Different requests can be processed by different servers;
The fact that any service instance can retrieve al application state necessary to execute a behavior from elsewhere enables resiliency, elasticity, and the ability for any available service instance to execute any task at all;
Stateless applications are essentially containerized microservices apps;
Orchestration for stateless apps helps determine the best location to run the container from the point of view of resources, including maintaining high availability (failover).

Stateful

Stateful applications are typically databases;
They involve transactions such as home banking;
They use mail servers;
The server processes requests based on the information relayed with each request and information stored from earlier requests – this means that the server must access and hold onto state information generated during the processing of the earlier request;
The same server must be used to process all requests linked to the same state information, or the state information needs to be shared with all servers that need it;
Orchestration for stateful applications involves determining the best location to run the container collection from the point of view of the applications’ overall needs including storage, network and IO Path point of view;
Orchestration for stateful applications also manages high availability – moving containers and remounting volumes with no application or code changes.

How do Stateful Apps Maintain State Information Between Client Requests?

There are several ways in which the load distribution facilities in the product can maintain state information between client requests, including:

Transaction affinity – in which the load distribution facility acknowledges a transaction’s existence and tries to direct all requests within that transaction’s scope to the same server;
Session affinity – in which the load distribution facility acknowledges a client session’s existence and tries to direct all requests within that transaction’s scope to the same server;
Server affinity – in which the load distribution facility acknowledges that while multiple servers might be acceptable for a specific client request, a specific server is best suited for processing that particular request;
In the case of distributed operating systems, the session manager (part of the application server) stores information about each client session and considers session affinity and server affinity while directing client requests to an application server’s cluster members;
The workload management service takes into account both server affinity and transaction affinity when deciding how to direct client requests between the cluster members of an application server.

Common Mistakes about Containers

Containers will only work in combination with stateless microservices-style applications.
You can’t containerize stateful applications.

We have to be mindful in talking about stateful and stateless applications as what may appear to fall in one category may not actually. This is mainly because stateless services have become very good at mirroring much of the behavior of stateful services without actually becoming them.

Statelessness is about a self-contained state and reference instead of depending on an external frame of reference. The main difference between it and statefulness, as stated above, is based on where the state is stored. In a stateless system, we are interacting with a limited system. In stateful computing, the state is stored by the client, which generates data of some kind to be used for future use by various systems i.e. “stateful” computing references a state.

How to Build Stateless Applications?

Docker makes it extremely easy to port apps and patch/update. To build containers via stateless – run “Docker Run”, which simply initiates your stateless app image from the Docker Hub/Registry.

Statelessness is an essential aspect of the Internet today – from reading the news and using HTTP to connect in a stateless manner (i.e. employing messages that can be separated from one another and your state) to scrolling through Twitter on your phone.

How to Build Stateful Applications?

Building stateful applications is not as straightforward as building stateless ones. DevOps teams might wonder how to adhere to the stateful aspects of containerization, or how to retrofit stateless containers architecture into a stateful model. A mechanism needs to be found to handle network, persistency and application primitives. Docker has introduced a stop-gap mechanism as a longer-term solution is found, which involves Docker Storage Plug-in, which could come in useful for very simple microservice apps. However, HA and scale continue to be an issue with the Docker Storage Plug-in.

The Challenges of Running Stateful Workloads

There are multiple challenges related to running a stateful workload:

Resource isolation – Many of the market’s current container orchestration solutions still involve only a best effort approach to resource allocation such as CPU, memory and storage. This may work for stateless apps, but when it comes to stateful ones, it can be a disastrous approach in which customer transactions or data are lost due to unreliable performance;
Backing storage – Each stateful data service may need or support a different kind of storage type (for example block devices or distributed filesystems), and determining the type of backing storage for a stateful application can be challenging.
Ongoing operations or management of the service’s full lifecycle – Running a single instance of a database for testing can be relatively straightforward, but managing the full lifecycle of the stateful app’s service can be highly challenging, including production deployment and operation, which necessitate highly available deployment, scaling and error handling procedures.

These challenges are in part due to the fact that many stateful applications were built for a non-containerized setting. Mapping generic primitives of a container orchestration platform to stateful services can be extremely time consuming and difficult to pull off. Organizations may begin by attempting to containerize their stateful services, but then they need to develop highly specific tooling to coordinate numerous related instances for high availability or employ other sophisticated strategies to deploy, manage or operate these services. This can lead to manual overhead requirements, which can become time consuming and costly and/or the need for development of customized operation for every single service, which can bring with it considerable operational risk.

Mesosphere DC/OS

Apache Mesos is the open source distributed systems kernel at the center of the Mesosphere DC/OS. Mesos abstracts the data center into just one single pool of computing resources, thereby simplifying running distributed systems at scale. Mesos supports multiple types of workload to construct applications, including container orchestration (Mesos containers, Docker, and Kubernetes), analytics (Spark), big data technologies (Kafka, Cassandra) and others. Mesos was constructed at UC Berkeley specifically for operations at hyperscale. It was tested and developed with Twitter and Airbnb and continues to support some of the world’s largest applications. Its two-level architecture enables organizations to customize their own operational logic within their apps, making operations more straightforward to run and operate. Different types of service can be run on the same infrastructure minimizing data center footprint and optimizing resource utilization.

The Mesophere DC/OS integrated platform for data and containers is built on top of Apache Mesos. It automates rollout and production operations for both containers and data services. It can be highly effectively used to manage stateless and stateful applications as it offers the built-in automation to manage the entire lifecycle of services, including their deployment, placement, security, scalability, availability, failure recovery, and service in-place upgrades. Many of the other equivalent technologies on the market involve far less automation and require manual configuration fixes that increase overhead, depend on the skill level of the implementation team, and can be challenging to maintain when employees leave an organization.

DC/OS and Stateful Applications

DC/OS can be used to manage stateless and stateful applications, but can be particularly helpful when it comes to running stateful applications and working around the challenges listed above. In particular, DC/OS simplifies storage management for popular distributed data services, including distributed database Apache Cassandra.

Mesophere DC/OS offers options for both local persistent storage and external volumes that data services might require. Local storage is “local” to the node inside the cluster and tends to be the storage resident within the machine i.e. internal disks, which can be partitioned for particular services and tend to provide the best performance and data isolation capabilities. However, the downside is that it binds the service or container to just one node. However, distributed services such as nosql databases tend to work well within this model.

Meanwhile external volumes tend to be attached to the container service over the network and are able to take on multiple different forms. This makes them well suited to the broadest type of applicability, as the container and storage are separated allowing containers to move freely around the cluster. Various cloud providers have storage services that fall within the external volume category, such as Amazon’s EBS or S3; plus distributed filesystems, including Gluster, HDFS, or NFS; or storage fabrics, including Ceph, Portworx, Quobyte and others.

Running at Scale

Mesosphere DC/OS is noted within the industry for running microservices at scale in production. Its App Ecosystem contains over 100 services, many of which are open source. Mesosphere DC/OS is the only platform that includes supported offerings for data services. It also offers a broad range of capabilities, options and freedom of selection for bringing stateful services to the containerized data center.