Site icon Bizety: Research & Consulting

Deep Look Into Apache Traffic Server

Apache Traffic Server is high performance web proxy-caching server known for its use by Yahoo!, processing over 30,000 requests per second and serving more than 30 billion web objects a day across the Yahoo! network. Since its inception as an open source software back in 2009, Apache TS has taken over the market as one of the leading proxy servers, distributing content to millions of users on a daily basis. In this guide, we’ll be delving deeper into its configuration and features, helping you to better decide whether Apache TS best suits your caching needs.

Unlike Varnish and Nginx, which function more explicitly as HTTP accelerators, Apache TS was designed with a broader range of capabilities. It can best be deployed in three different ways:

Installation

For installation, you have the two basic options—download it from the source code or from Apache distribution packages. In order to ensure you have the latest features, Apache recommends that you download Traffic Server straight from the source code. Their distribution packages have been known to lag behind the current stable release by a significant amount.

In order to install it from the source, your server will need the following tools and libraries to properly build the software, with further guidelines outlined here.

Configuration

Once you have Apache TS installed, there are two types of configuration—you can set it up as a reverse proxy or a forward proxy.

Below, we’ll be exclusively discussing the configuration and features of Apache TS’s reverse proxy, installed from the source code. For more information on their forward proxying capabilities, check here.

Reverse Proxy Configuration

In order to setup your reverse proxy, a few changes need to be made to the configuration files located in the /opt/ts/etc/trafficserver directory. In the records.config file, make sure that the following settings have been configured:

Having these settings configured will enable reverse proxying and basic security measures. The next step is to make sure Apache TS knows what to proxy. You do this by writing remap rules using the conf_remap plugin.

When doing this, you first have to configure the origin location. If you run TS and the origin web server on the same host, you must reconfigure the origin server to listen on port 8080 and change TS to bind to 80. Now, all requests made to the domain name will be received by Apache TS, which knows to proxy those requests to localhost: 8080 if they are not in the cache.

By default the configuration will provide a 256 MD disk cache located in var/trafficeserver/ under the install prefix. You can adjust the size and location with the storage.conf file.

Note that any change you make to the cache configuration requires TS to restart. Also if you choose to configure it as a forward proxy, that requires that the reverse proxy is shut off.

Basic Architecture

Apache TS uses a hybrid event-driven engine with a multi-threaded processing model to handle incoming requests. This means that it scales very well on modern multi-core servers even though it was designed for an older generation of servers.

As an open source product, many developments have been made on the software over the years to help it compete with other web accelerators, and adapt to current traffic needs, which it has proven successful at given the billions of web objects it serves for Yahoo! everyday.

In order to run the Apache TS, there are three processes that work together to serve requests and manage the health of the system.

You can also use traffic_ctl to collect and process statistics from the network traffic information. Apache TS performs transaction logging, which records information in a log file about every request Apache TS receives and every error it detects. This allows you to see how many clients use Apache TS cache, how much info each user requested, what pages were most popular and so on. You can also see any transaction errors and the state of the server at the time, which helps to offer support to best setup the system to suit your needs.

Cache Architecture

In addition to all its proxying capabilities, Apache TS also serves as a caching element.  All raw storage of cached content can be found in storage.conf. Each line in the raw storage defines a cache span, which is the entire unit of storage. These cache spans are then broken down into cache volumes, which are user-defined units of persistent storage. For speed, each cache volume is spread across all multiple caches spans. Each section of cache volume on a specific cache span is referred to as a cache stripe. Cache stripes are the smallest unit of storage and always reside on a single physical device.

All cache stripes are tracked in a directory, which is always fully sized no matter how much content is stored in it. This means that Apache TS does not consume more memory as more content is stored in the cache. Instead it works off the assumption that if there is enough memory to run an empty cache, there’s enough to run a full one. Therefore, the size of a directory is related to the size of the stripe, which is why the memory footprint of Apache TS depends strongly on the size of the disk cache.

Apache TS has an object database that indexes content according to URLs and headers. This database can efficiently store very small or large objects, even in a different language or encoding type. Apache TS also self-cycles, progressively removing stale data when the cache is full.

In the database, two types of objects are stored—either metadata or content data. Metadata is all the data about the object and the content and includes HTTP headers. The content data is the content of the object and what is delivered to the client.

The cache architecture is also designed to tolerate disk failures on any cache disks. If the disk fails, then Apache TS marks the entire disk as corrupt and continues to use remaining disks. Since each storage unit in each cache volume is mostly independent, the loss of a disk means the cache volume on that span will shut down, but corresponding ones across the system will continue to store data. The architecture also supports RAM cache to serve the most popular objects as fast as possible, which reduces load on the other disks during traffic peaks, decreasing the chances of system failure.

Cache Operations

Once an HTTP request header has been parsed and remapped, the process of caching has begun. But in order for an object to be cached, it first needs to be termed cache valid, meaning that it has to meet the requirements set by the cache operations. The three basic operations that are used to define what can be cached are as follows:

Within these operations all the parameters can be set for what types of content and headers are valid to be cached. For tuning purposes, it’s necessary to adjust these constraints to best serve your specific needs.

Is Apache TS for you?

Deciding which reverse proxy/caching software to use varies greatly depending on your site and its specific needs. As a quick overview, below are some pros and cons outlined specifically to Apache TS.

While there’s no one definitive answer for what proxy server is optimal, with this information, hopefully you will have all the details necessary to make an informed decision regarding your proxying needs.

Copyright secured by Digiprove © 2016
Exit mobile version