The idea of a “service mesh” has become increasingly popular over the last couple of years and the number of alternatives available has risen. There are four open-source products available today: (i) Linkerd (sponsored by Buoyant). It was built on Twitter’s finagle library and was the first product to popularize the term service mesh; (ii) Envoy (built by Matt Klein and the team at Lyft), designed for use as an “universal data plane” for service mesh architectures or as a standalone proxy; (iii) Istio (initially released as an open-source collaboration between Lyft, IBM, Google and others), designed as a universal control plane and written from the ground-up to be platform agnostic; (iv) Conduit (also sponsored by Buoyant), a simplified version of the service mesh experience for Kubernetes.
AWS and Google each offer their own service mesh iterations: AWS’ is called App Mesh and Google uses its own iteration of Istio. These are the two we will focus on here.
Firstly, what is a service mesh?
A service mesh is an infrastructure layer for microservices dedicated to making the management of service-to-service communication controlled, visible and manageable.
Any service mesh will have a typical set of features:
- Control over routing of requests (e.g. for CI/CD release patterns)
- Cascading failure prevention (e.g. circuit breaking, retries)
- Load balancing algorithms
- Security features (such as TLS, encryption, authentication and authorization)
- Metrics, which offer instrumentation at the service-to-service layer
The details of how these features are implemented varies between providers.
In microservices architecture, the service mesh is a key layer in determining how your applications will behave at runtime, and helping boost their reliability. Application functions that previously occurred locally as part of a shared runtime now occur as remote procedure calls being sent across an unreliable network. The success or failure of the complex decision trees that underpin your business needs depend on reliable, consistent results and an accounting for the reality of programming for distributed systems.
A service mesh shares some similarities with other message management solutions such as API Gateways, Enterprise Application Integration patterns EAI) or Enterprise Service Bus (ESB); the key difference being that a service mesh is oriented around a larger problem set. This means its implementation exists outside the applications themselves. Instead of coding remote communication management directly into your apps, you can deploy a set of interconnected proxies (“a mesh”), allowing the programming logic to be decoupled from your apps, removing that responsibility for developers.
The control plane vs. the data plane
The control plane is the policy and set of configurations, which control traffic. It is the “how” behind the way in which decisions are implemented. The data plane, meanwhile, refers to the actual actions performed by data (network packets) into and out of a microservice, using the capabilities listed above (routing, load balancing, security, etc.).
The data plane is typically implemented as a “side-car” proxy, which runs alongside each microservice in play. The most popular data plane is currently Envoy Proxy, an open source edge and service proxy created by engineers at Lyft). This is the data plane used by AWS App Mesh (and many others, including Airbnb, Booking.com, IBM, Medium, Netflix and Uber).
Istio was previously the dominant open source service mesh in the control plane space, used by Google; however, AWS App Mesh has now also moved into that space.
AWS App Mesh vs. Google Istio
AWS App Mesh
In November, AWS released a public preview of its own service mesh to be used to monitor and control communications across microservices applications on AWS. The AWS App Mesh can be used with microservices running on Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Container Service for Kubernetes (Amazon EKS), and Kubernetes running on Amazon EC2. Its functionality and integrations are still under development.
App Mesh currently uses Envoy, which makes it compatible with other open source and AWS partner tools for monitoring microservices. Observability data can be exported to various AWS and third-party tools, including AWS X-Ray, Amazon CloudWatch, and any third-party monitoring and tracing tools that integrate with Envoy. New traffic routing controls can be configured to enable blue/green canary deployments for your services.
App Mesh is designed to provide “a consistent, dynamic way to manage the communications between microservices”. The logic for monitoring and controlling communications across microservices is put into service as a proxy that runs next to each microservice rather than being built into the code of each microservice. The proxy takes care of all the network traffic that flows in and out of the microservice and offers consistency for “visibility, traffic control, and security capabilities to all of your microservices”.
App Mesh can be used to explore how your different microservices interconnect. App Mesh automatically computes and sends the correct configuration to each microservice proxy.
There is no additional pricing for App Mesh further to the computing resources you already use with ECS/EKS/EC2, etc.
Istio has been the main player in the service mesh arena for a while, and shares similarities with AWS App Mesh in that it also wraps Envoy as the data plane. Both also are aimed at solving a similar set of needs in allowing you to monitor and control the traffic flow between your microservices. Istio is open source and vendor agnostic. Istio 1.0 was aimed at developers managing their services in an hybrid environment, in which multiple workloads run in different environments—clouds and on-premises, in containerized microservices or monolithic virtual machines.
As Istio has been around for a lot longer than AWS App Mesh, it currently offers a much larger degree of functionality and features. These include transport (service-to-service) authentication through support for mTLS, and Origin (end-user) authentication via JWTs and integration with Auth0, Firebase Auth and Google Auth. Istio also supports a variety of platforms not just using AWS IAM, but also Kubernetes and GKE/GCE/GCP.
Google standardized Istio as the management layer of its Cloud Services Platform (CSP) in August of 2018. It was designed to work in combination with two other new features built at the same time: Knative, a Kubernetes-based open source framework to be used to built, deploy and manage serverless workloads, and the on-premise version of the Google Kubernetes Engine (GKE), its container management tool. Similarly to AWS App Mesh, the goal was to allow organizations to use Istio as part of CSP to manage an entire ecosystem of containers and serverless infrastructure, from on-premise to public cloud.
Google made its own recent announcement in December, launching an update to the Google Kubernetes Engine to bring integrated support for the Istio service mesh to service. It is currently in beta.
Istio integrates with Stackdriver; this integration sends service metrics, logs, and traces to Stackdriver (GCP’s native monitoring and logging suite), letting you monitor your “golden signals” (traffic, error rates, and latencies) for all services running in GKE.
Chen Goldberg, Google Cloud director of Engineering, and Jennifer Lin, Google Cloud director of Product Management, wrote of the release, “With Istio on GKE, we are the first major cloud provider to offer direct integration to a Kubernetes service and simplified lifecycle management for your containers.”
Google Cloud CTO Urs Hölzle told Diginomica last summer that he expects near universal adoption of Istio: “My expectation would be, 90% of Kubernetes users use Istio two years from now. It’s such a natural fit to what Kubernetes provides, it almost feels like the next iteration of Kubernetes. It’s done by the same team, the two work well together”, adding “We hope many companies will make this a centerpiece of their journey to the cloud and this hopefully makes it a much smoother path to the cloud … Once people are familiar with the Kubernetes and Istio way of managing and orchestrating, cloud will be very not scary”.
In building their own service mesh offerings (albeit based on two of the most popular open source models), AWS and Google are making it easier to manage microservices across each of their respective platforms. Both are enabling a more straightforward approach to the orchestration of different endpoints and microservices. As computing becomes increasingly distributed in nature, these kinds of service mesh will become more and more essential in producing useful business outcomes. They make microservices less daunting and more possible as previously the more autonomous services you had, the more complicated it became to manage them. Ultimately, the biggest plus of the service mesh, whether AWS’ or Google’s, is that it allows you to concentrate management tasks in one place.
AWS has not only huge engineering resources as its disposal, but widespread popularity within the larger engineering community, so perhaps it will displace Istio despite currently lacking some of its features. A space to watch…
For now, the selection of which service mesh to use will ultimately depend on what platforms you need to support, in addition to operational questions such as what problems you’re currently experiencing while managing your distributed production apps, the level of observability you need for your services, the division of responsibility between teams, and so on.