Deep Look Into Apache Traffic Server

Michael C

8 years ago

Apache Traffic Server is high performance web proxy-caching server known for its use by Yahoo!, processing over 30,000 requests per second and serving more than 30 billion web objects a day across the Yahoo! network. Since its inception as an open source software back in 2009, Apache TS has taken over the market as one of the leading proxy servers, distributing content to millions of users on a daily basis. In this guide, we’ll be delving deeper into its configuration and features, helping you to better decide whether Apache TS best suits your caching needs.

Unlike Varnish and Nginx, which function more explicitly as HTTP accelerators, Apache TS was designed with a broader range of capabilities. It can best be deployed in three different ways:

Web Proxy: Receives user requests for web content and diverts the traffic. Either serves content directly from the stored cache or redirects the request to the origin server on the user’s behalf and then stores the response.
Reverse Proxy: Places Apache TS at the front of the server to accept the incoming client requests as if it were the origin server. This helps to accelerate the process of receiving requests and disseminating content.
Cache Hierarchy: Participates in cache hierarchies where unfulfilled requests to one cache are routed to other regional caches, which helps increase speed and lower bandwidth usage by searching nearby caches before going straight to the origin server for content.

Installation

For installation, you have the two basic options—download it from the source code or from Apache distribution packages. In order to ensure you have the latest features, Apache recommends that you download Traffic Server straight from the source code. Their distribution packages have been known to lag behind the current stable release by a significant amount.

In order to install it from the source, your server will need the following tools and libraries to properly build the software, with further guidelines outlined here.

pkgconfig
libtool
gcc (>= 4.3 or clang > 3.0)
GNU make
openssl
tcl
expat
pcre
libcap
flex (for TPROXY)
hwloc
lua
curses (for traffic_top)
curl (for traffic_top)

Configuration

Once you have Apache TS installed, there are two types of configuration—you can set it up as a reverse proxy or a forward proxy.

Reverse Proxy: The most common configuration is to set up ATS as a transparent and caching reverse proxy that forwards all requests to a single origin address and caches the responses based on their headers.

Forward Proxy: Unlike Varnish or Nginx, Apache TS has the ability to be configured as a transparent forward proxy. This is typically used when you need to improve the performance of a local network’s use of external resources or you want to have the ability to monitor or filter your traffic.

Below, we’ll be exclusively discussing the configuration and features of Apache TS’s reverse proxy, installed from the source code. For more information on their forward proxying capabilities, check here.

Reverse Proxy Configuration

In order to setup your reverse proxy, a few changes need to be made to the configuration files located in the /opt/ts/etc/trafficserver directory. In the records.config file, make sure that the following settings have been configured:

proxy.config.http.cache.http: enables caching of proxied HTTP requests.
proxy.config.reverse_proxy.enabled: enables reverse proxying support.
proxy.config.url_remap.remap_required: requires a remap rule that ensures your proxy can’t be corrupted by users trying to mask their identities to access your site.
proxy.config.url_remap.pristine_host_hdr: makes sure that all the client request headers stay the same, which is useful when the origin server is performing domain-based virtual hosting or other actions dependent on the header
proxy.config.http.server_ports: makes sure port 8080 is used for HTTP traffic

Having these settings configured will enable reverse proxying and basic security measures. The next step is to make sure Apache TS knows what to proxy. You do this by writing remap rules using the conf_remap plugin.

When doing this, you first have to configure the origin location. If you run TS and the origin web server on the same host, you must reconfigure the origin server to listen on port 8080 and change TS to bind to 80. Now, all requests made to the domain name will be received by Apache TS, which knows to proxy those requests to localhost: 8080 if they are not in the cache.

By default the configuration will provide a 256 MD disk cache located in var/trafficeserver/ under the install prefix. You can adjust the size and location with the storage.conf file.

Note that any change you make to the cache configuration requires TS to restart. Also if you choose to configure it as a forward proxy, that requires that the reverse proxy is shut off.

Basic Architecture

Apache TS uses a hybrid event-driven engine with a multi-threaded processing model to handle incoming requests. This means that it scales very well on modern multi-core servers even though it was designed for an older generation of servers.

As an open source product, many developments have been made on the software over the years to help it compete with other web accelerators, and adapt to current traffic needs, which it has proven successful at given the billions of web objects it serves for Yahoo! everyday.

In order to run the Apache TS, there are three processes that work together to serve requests and manage the health of the system.

traffic_server process accepts connections, processes protocol requests and serves documents from the cache or origin server.

traffic_manager process responsible for launching, monitoring and reconfiguring the traffic_server process. Responsible for the proxy autoconfiguration port, the statistics interface, cluster admin. Will restart the process if it detects a failure

traffic_cop process monitors the health of all other processes, periodically sending out heartbeat requests as a form of maintenance. In event of failure it will restart the manager and server processes.

You can also use traffic_ctl to collect and process statistics from the network traffic information. Apache TS performs transaction logging, which records information in a log file about every request Apache TS receives and every error it detects. This allows you to see how many clients use Apache TS cache, how much info each user requested, what pages were most popular and so on. You can also see any transaction errors and the state of the server at the time, which helps to offer support to best setup the system to suit your needs.

Cache Architecture

In addition to all its proxying capabilities, Apache TS also serves as a caching element. All raw storage of cached content can be found in storage.conf. Each line in the raw storage defines a cache span, which is the entire unit of storage. These cache spans are then broken down into cache volumes, which are user-defined units of persistent storage. For speed, each cache volume is spread across all multiple caches spans. Each section of cache volume on a specific cache span is referred to as a cache stripe. Cache stripes are the smallest unit of storage and always reside on a single physical device.

All cache stripes are tracked in a directory, which is always fully sized no matter how much content is stored in it. This means that Apache TS does not consume more memory as more content is stored in the cache. Instead it works off the assumption that if there is enough memory to run an empty cache, there’s enough to run a full one. Therefore, the size of a directory is related to the size of the stripe, which is why the memory footprint of Apache TS depends strongly on the size of the disk cache.

Apache TS has an object database that indexes content according to URLs and headers. This database can efficiently store very small or large objects, even in a different language or encoding type. Apache TS also self-cycles, progressively removing stale data when the cache is full.

In the database, two types of objects are stored—either metadata or content data. Metadata is all the data about the object and the content and includes HTTP headers. The content data is the content of the object and what is delivered to the client.

The cache architecture is also designed to tolerate disk failures on any cache disks. If the disk fails, then Apache TS marks the entire disk as corrupt and continues to use remaining disks. Since each storage unit in each cache volume is mostly independent, the loss of a disk means the cache volume on that span will shut down, but corresponding ones across the system will continue to store data. The architecture also supports RAM cache to serve the most popular objects as fast as possible, which reduces load on the other disks during traffic peaks, decreasing the chances of system failure.

Cache Operations

Once an HTTP request header has been parsed and remapped, the process of caching has begun. But in order for an object to be cached, it first needs to be termed cache valid, meaning that it has to meet the requirements set by the cache operations. The three basic operations that are used to define what can be cached are as follows:

cache lookup: determines if an object is in the cache and if so, where it is located. Verifies if the object is still present in the cache.
cache read: after a successful lookup. Checks how old the object is by looking at the headers and possibly other metadata.
cache write: virtual connection that receives the data and writers it to cache, also covers existing objects that need to be modified. Evacuation is also driven by the cache write.

Within these operations all the parameters can be set for what types of content and headers are valid to be cached. For tuning purposes, it’s necessary to adjust these constraints to best serve your specific needs.

Is Apache TS for you?

Deciding which reverse proxy/caching software to use varies greatly depending on your site and its specific needs. As a quick overview, below are some pros and cons outlined specifically to Apache TS.

Pro: Ability to serve in a cache hierarchy—internet requests not fulfilled from one cache are routed to other regional caches, leveraging the contents and proximity of nearby caches. Other systems lack this feature

Con: Load Balancing is only offered as an experiment plug-in. Other software like Nginx has more details and features specific solely to load balancing.

Pro: Security options. Unlike Nginx and Varnish, Apache TS has the ability to use SSL termination. You can also configure Apache TS to use multiple DNS servers to match the site’s security configuration, verify that clients are authenticated before they can access content from the cache
Pro: Tuning: Allows for memory allocation, CPU selection, disk storage parameter. Thread scaling, thread affinity to take advantage of cache pipelines and faster memory access, and so on.

While there’s no one definitive answer for what proxy server is optimal, with this information, hopefully you will have all the details necessary to make an informed decision regarding your proxying needs.