Bitmovin, a startup that provides an adaptive streaming HTML5 player that transcodes digital video and audio to various formats has recently introduced per-title encoding, a ground breaking technology. Founded in 2013 at the University of Klagenfurt, Austria, Bitmovin actively participates in MPEG-DASH, an open standard which lets streaming video be played in HTML5 video and Flash players.
The Early Days of Per-Title Encoding
Per-Title Encoding is not new. Early research generally found that Per-Title Encoding worked in test environments, but wasn’t suitable for commercial use because it didn’t work with a fixed bitrate ladder. As every piece of content is unique, each video requires initial individual analysis.
In December 2015, after many years of research and development, Netflix introduced Per-Title Encoding at scale. They were able to increase the quality of experience and provided bandwidth savings by increasing or decreasing the bitrate of every bitrate ladder entry founded on a complexity measurement for each input file. YouTube was next and today, per-title encoding is increasingly showing up, including now at Bitmovin.
What is Per-Title Encoding?
It is a form of encoding optimization to customize the bitrate ladder for every video based on the video file’s complexity. The aim is to choose a bitrate which leaves sufficient room for the codec to encapsulate sufficient information to present an ideal viewing experience, but no more. Each type of content , even each title, has a different optimal bitrate. Scenes which portray fast motion and fewer redundancies such as sports events or action scenes make the bitrate more complex. Documentaries on the other hand, generally have significantly less motion during most of its scenes, which allows the codec to compress information without losing quality.
To decide the best bitrate for the content, a high quality metric is needed to measure against. Different types of content need to be encoded with a different bitrate setting. The next step is a Peak to Signal Noise Ratio (PSNR) analysis of each individual encoding to form an objective impression of the powerfulness of the encoding parameters. Based on this analysis, you initiate a custom bitrate ladder to encode each individual content file. Normally, this approach leads to an improved quality of experience and reduced bandwidth usage: essential to most online streaming providers.
There are limitations to the PSNR method, however, once you apply this optimization to a significant number of titles. The Structural SIMilarity (SSIM) index, by contrast, is a tool which measures the similarity between two different images. One image is a control image, which the second is compared against, allowing you to measure the results of your optimization. SSIM focuses on changes (luminance, structure) that affects the quality: thus it is a perception-based model.
Bitmovin is always looking for ways to adapt a given bitrate ladder to the complexity requirements of the content to reduce costs and save time.
As demonstrated in the diagram above, the initial step of a Bitmovin-Per-Title encoding approach constitutes a “complexity factor” for a given input. Every asset is encoded with a distinct Constant Rate Factor (CRF) to measure complexity, which is based on the degree of motion perceived in the content. A “complexity factor” can then be allocated to the content, ranging from 0.5 to 1.5. Content with a complexity factor between 1-1.5 is deemed high complexity whereas content with a complexity factor between 0.5-1 is deemed to be less complex.
This, in combination with the resolution of the bitrate ladder entry, leads to an adjusted encoding profile for each asset. The new configuration file optimizes the encoding ladder by creating settings specific to each asset. For low complexity content, for example, the higher bitrate levels can be reduced without compromising on visual quality; the lower bitrate levels are also reduced but to a lesser degree in order to avoid degradation of the quality of the video. It works the other way around for high complexity content. The high bitrate levels are not significantly reduced as no significant gain in video quality would be made; however, the lower bitrate levels are adjusted because adding bitrate allows Bitmovin to significantly boost video quality. Modern codecs work better on larger resolutions as they contain significantly larger uniform areas that can be more effectively compressed. Therefore, fewer bits per pixel are required to achieve a similar standard with higher resolutions compared to smaller ones.
This leads to ABR encoded content, which is delivered to storage as part of the regular encoding workflow. For videos which are encoded with a PSNR of 45 dB or above with the source video, the user won’t notice a difference even though less information was used to deliver the content. However, a PNSR of 35dB or below would show differences between the encoding and its source file that a user would pick up on.
The relatively insignificant increase in encoding cost as a result of the extra processing generated from the trial encodings outweighs the savings in bandwith and the overall improvement in user experience.
Bitmovin provide demos with example encodings on its demonstration page.