Facebook’s Optimizes Its Caching Layer, and Other News

Categories

Facebook’s Expose on Caching in Their CDN

Facebook recently published a description of their CDN caching upgrade process, outlining the evolution from McDipper to BlockCache to RIPQ. Facebook is striving to utilize its CDN to deliver photos and videos to its 1.65 billion users while reducing the backbone traffic and backend strain on its storage systems. The Facebook CDN (FBCDN) attempted to maximize the possibilities of their SSDs through system upgrades. Though the SSDs provided higher capacity and higher random read rates than other hard drives, they also suffered from high write amplification, in which multiple physical copies of a write exist on the device, resulting in both a decrease in the throughput and lifespan of the SSD.

Motivated by a 2014 study that showed the immense benefits of First-In-First-Out caching logic, Facebook began to manipulate more advanced caching algorithms and observed hit rate improvements of 8-21%. Facebook then began to look for ways to improvement upon its caching protocol at that time, McDipper. BlockCache was soon introduced in late 2014. BlockCache leveraged Segmented LRU logic with fixed blocks of items positioned in the logical address space of the SSD.

As block sizes, and therefore their contents, increased, the FTL write amplification decreased. However, the algorithm fidelity of the blocks also decreased. Finding a compromise between block size, fidelity and amplification, BlockCache was able to improve upon the First-In-First-Out (FIFO) baseline by 10% and increased the amplification 1x to 3.9x wth 64MiB block sizes. However, BlockCache still struggled with very coarse granularity when tracking access patterns and the inherent drawbacks of the Segmented LRU algorithm.

Eventually, Facebook worked to find the middleground between McDipper and BlockCache that maximized the hit ratio (fraction of requests served with cache data) on the SSDs while minimizing the amplification and therefore increasing the lifespan and throughput of the SSD. Their work resulted in the formation of RIPQ, the Restricted Insertion Priority Queue.

Though LRU and LFU protocols both improved hit ratios, they also resulted in large amplification increases. RIPQ is a novel caching framework that decouples flash I/O management from data caching policies to support many different caching algorithms efficiently on modern flash devices. RIPQ provides several improvements upon both previous protocols by implementing priority-aware memory blocks, using virtual blocks, and performing reinsertions and evictions. Using the same block size as BlockCache, RIPQ was able to improve the hit ratio by 3-5% and decrease amplification (about 1.2), making 3x more flash bandwidth available when compared to BlockCache and extending the lifespan of the SSD.

Once the FBCDN implemented RIPQ, it noticed that popular items were often clustered together on blocks, forming what they called “hot blocks,” which, when evicted often caused numerous reinsertions into the cache, saturating the bandwidth of the SSD and causing latency spikes. To address this concern, a reinseration ratio limit was implemented and those blocks with high reinsertion ratios (>0.6) were moved directly to the queue head and reinserted when it became colder.

Additionally, an exponential moving average of the reinsertion ratio was installed to remove blocks that were relatively colder than the other blocks, decreasing spikes and increasing available bandwidth. With these changes in place, the FBCDN is continuing its mission to improve the caching policies of RIPQ and generally improve the the hierarchy of memory and storage.

Google’s SyntaxNet, Complete with Parsey McParseface, Now Open Source

SyntaxNet is Google’s neural network framework for a sytactic parser, and it is now available as open-source software. SyntaxNet functions as the foundation for Natural Language Understand systems, working to overcome this AI-hard problem. SyntaxNet tags parts-of-speech (POS), determines the relationship between the words and displays the relationships as a dependency parse tree.

This becomes a very challenging task for AI when sentences become 20 or 30 words long and can have as many as tens of thousands of possible dependency parse trees due to the ambiguity of language. To simplify the creation of the tree, SyntaxNet uses beam search, a heuristic search algorithm and processes the sentence from left to right and ranks the trees to eliminate unlikely possibilities.

SyntaxNet is implemented in TensorFlow and includes Parsey McParseface, the most accurate English parser in the world. Parsey is fueled by powerful machine learning algorithms functioning to parse and analyze linguistic structure and explain the function of words in English text. It is able to detect individual dependencies between words with 94% accuracy (linguists trained for this task typically perform with 96-97% accuracy) and it was able to achieve over 90% parsing accuracy on the Google Webtreebank dataset.

A $5 Million Investment for 5G Upgrades and More by Saguna Networks

Led by one of its current stakeholders, CR Ventures, Israeli based Saguna Networks secured an additional $5 million in VC investment this month. Saguna Networks specializes in providing a fully virtualized Mobile Edge Computing (MEC) to improve mobile broadband connections. It utilizes Radio Access Networks (RAN) for cloud computing to create an open ecosystem and growth engine inside the RAN, bringing content closer to mobile users. Saguna Networks first round of VC investment resulted in $3.2 million in February of 2015, and it plans to use this additional investment to improve its market presence and improving its offerings to keep pace with the growing presence of the Internet of Things and the impending presence of 5G content delivery.

Google Helps Webmasters Clean Up Their Act

At the 2016 International World Wide Web Conference, Google presented the results of its paper “Remedying Web Hijacking: Notification of Effectiveness and Webmaster Comprehension.” This study outlined their findings when they attempted to contact the webmasters of the approximately 16,500 new sites worldwide that are compromised each week.

These compromised sites result in over 10,000 million users encountering malware and scams from these sites each week. Google found that even when webmasters were aware that their site was infected, they were often forced to rebuild their site from scratch due to the lack of a secure copy of the website. Working directly with Google, however, about 75% of websites were able to re-secure their content in a medium of 3 days time.

Google reported that getting in touch with the webmasters of compromised pages was often the most difficult step in the process of re-securing their website. Google attempted to email (which resulted in successful re-securing for 75% of webmasters who registered their site with Search Console), emit browser warning (with 54% success) and search warnings (with 43% success) for webmasters. Google also provided tips for cleaning up harmful content such as hidden files,  redirects or remote inclusions of scams and malware to webmasters, improving cleanup time by 62%.

Finally, Google monitored sites after the cleanup and found that 12% were compromised again with 30 days, exhibiting to the webmasters that the root cause of the breach was not addressed. Finally Google encouraged webmasters to maintain a reliable communication channel and promptly provide victims of their breach with clear recovery steps.

Scroll to Top