As various complex functions such as data compression and cryptography are increasingly performed on data packets, network traffic throughput can suffer. One means of increasing throughput and avoiding latency is through the use of open-source packet processors and various packet-processing acceleration techniques. In order to understand how these tools optimize this process, it’s important to first understand how packet processing works.
Each file transmitted through the Internet is broken into chunks, or packets, comprised of three parts. The header provides information about the packet, such as its origin, destination, and length; the payload comprises the data itself; and the trailer indicates the end of the packet and includes information about error detection and correction. Because packets take many different routes through the network and often arrive out of order, the information from the trailer is used to reassemble the file for user consumption when the packets arrive at their destination.
As packets of data move through various network elements, different algorithms are used to control the dataflow. Just as a network is divided into the control plane and data plane, packet processing algorithms are applied to either the control information, which is used to transfer the packet to its destination, or the payload, performing content-driven actions on the data content. However, various delays and bottlenecks can occur during packet processing due to various functions like switching between analog and digital when processing voice and video applications. These delays increase as data becomes increasingly complex and bandwidth hungry. In order to overcome these delays, a variety of architectural approaches and open-source packet processing software has been developed, several of which we’ve examined below:
Because Linux is designed as a generic system that caters to many different applications, many core cycles are needed for context switching, locks, and higher OS layers, which can lead to bad cache utilization and bottlenecking. As a result, packet processing acceleration must be done in order to optimize the networking stack in a scalable manner. Because data path processing requires far less code and is used far more often than control path processing, the best approach for packet processing is acceleration through either datapath or application-specific fastpath (ASF). The goal of fastpath is to take care of all routine operations, freeing up the normal path for special connections, core database, and control packets.
- Datapath acceleration through fastpath – For a software-based datapath acceleration through fastpath, design principles include avoiding multiple lookups, leveraging hardware functionality for higher throughput, following a run-to-completion model for reduced context switching and better cache usage, using intelligent data structures to avoid locks with packet-processing, and buffer recycling for faster memory.
- ASF – ASF accelerates packet processing for frequently used functionalities, freeing up the normal path for stateful network processing. ASF implementations consist of the packet engine which performs packet processing; configuration APIs which provide an interface for any networking stack or OS; and control logic which offloads packet-control information from the Linux stack and configures the packet engine using the APIs. Essentially, ASF functions by providing a set of rules for processing incoming packets. When packets are received, they are checked for matches in ASF and Linux lookup tables. If matches are found, the packets are processed through the fastpath. If not, they are returned to the Linux stack to be processed through the normal path.
Snabb is a packet networking toolkit that compiles into a standalone executable with multiple applications and runs on any Linux/x86-64 distribution. It is written using Lua, LuaJIT compiler, and Ethernet I/O using kernel bypass mode. In addition, Snabb is a community of programmers dedicated to building new networking elements. The first generation of applications created by the Snabb community include:
- Snabb NFV – implements a network functions virtualization component based on Snabb. This improves QEMU/KVM networking performance for applications that require high packet rates, and is intended for processing up to 100 Gbps or 50 Mpps of network traffic per server.
- IwAFTR – component of IPv6 transition technology which allows ISPs to provide their users IPv4 access while maintaining a IPv6-only internal network. It can also be used to share IPv4 addresses between different customers to lower costs and circumvent IPv4 space exhaustion.
- Packetblaster – tool for moving high volumes of traffic using a very small percentage CPU, allowing it to be used for packet processing on a small server or Device Under Test.
PacketShader is a PC-based software routing platform that uses Graphic Processing Units (GPUs) to accelerate packet processing. GPUs allow for fast graphics rendering and high-bandwidth parallel applications with large computation cycles. By offloading computation and memory-intensive routing applications, Packetshader avoids the CPU performance bottleneck typical of high-speed software routers.
To further enhance the utilization of GPUs on high-speed software routers in Linux, Packetshader also performs packet I/O optimizations such as pre-allocating packet buffers, batch processing, NUMA-aware data placement, and leveraging receive-side scaling (RSS) for linear scalability of multi-core CPUs.
Vector Packet Processing (VPP) is the open-source version of Cisco’s VPP technology. It is an out-of-the-box packet processing stack that is hardware, kernel, and deployment agnostic and runs on commodity CPUs. The modular, extensible, and flexible framework is built on a packet processing graph, meaning anyone can plug in new graph nodes, rearrange the graph, or build an independent plugin, allowing for the creation of any kind of packet processing application. Moreover, this can be done without changing the core code as the engine runs in userspace. The platform is feature-rich, providing out-of-the-box switch/router functionality, fast lookup tables for bridge entries and routes, and arbitrary n-tuple classifiers at a high level, as well as many other IPv4/IPv6, MPLS, and L2 features.
In contrast to scalar packet processors, which process one packet at a time, vector processing reads the largest available number of packets from the network I/O layer and processes the entire vector through the packet processing graph at once. This results in faster and more reliable performance, since throughput and latency are stabilized as l-cache misses amortize over a large number of packets.