Protocol Buffers vs. JSON

November 12, 2018

Protocol Buffers

Google’s design goals for Protocol Buffers were for it to be smaller, simpler and quicker than XML. The developers placed an emphasis on simplicity and performance. Google used Protocol Buffers widely for storing and interchanging structured information of all types. It also acts as a basis for a custom procedure call (RPC) system, which is used for almost all intermachine communication at the company.

Facebook uses an equivalent protocol called Apache Thrift and Microsoft uses Microsoft Bond protocols in addition to a concrete RPC protocol stack used for defined services known as gRPC.

Protocol buffers are a language-neutral and platform-neutral extensible mechanism. Google describes their use as such, “You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.”

Starting to use protocol buffers is straightforward. You simply download and install the protocol buffer compiler, and use the overview and tutorial as needed.

JSON

JSON is mostly used in web applications to send data from the server to the browser. You normally transfer JSON data using Ajax, which enables the exchange of data between your web application and the browser and server without the need to reload the page.

An example from Elated:

A user clicks a product thumbnail in an online store
The JavaScript running in the browser makes an Ajax request to a PHP script running on the server, passing it the ID of the clicked product
The PHP script retrieves the product name, description, price, and other info from the products database, encodes the data as a JSON string, and sends the string back to the browser
The JavaScript running in the browser decodes the JSON string and displays the product details in the page for the user.

JSON can also be used to send data from the browser to the server, as well as the JSON string is encoded as a GET or POST parameter. It is used in this way less frequently as the data sent in Ajax requests are typically fairly simple.

JSON is based on how you define objects (which are similar to what other languages call associative arrays or hashes) and arrays. The two main parts of JSON are keys and values. They make a key/value pair.

Key: A key is a string enclosed in quotation marks.
Value: A value can be any of the following: a string, number, boolean expression, array, or object.
Key/Value Pair: A key value pair comes after a particular syntax, with the key followed by a colon followed by the value. Key/value pairs are always comma separated.

There are five types of value:

Array: An associative array of values. This includes categories and tags.
Boolean: True or false.
Number: An integer.
Object: An associative array of key/value pairs, which is indicated by curly brackets. Everything inside of them is part of the object. A value can be an object.
String: Several plain text characters which usually form a word.

How do Protocol Buffers and JSON Compare?

Protocol Buffers and JSON messages can be used interchangeably; however, they were designed with different goals in mind.

JSON arose out of a subset of the JavaScript programming language. Its messages are exchanged in text (human-readable) format, are completely independent and supported by almost all programming languages.

Protobuf is not only a message format. It is simultaneously a set of rules and tools that define and exchange the messages. It is currently restricted to only some programming languages. In addition, it has more data types than JSON, such as enumerates and methods, and has other functions, including RPC.

Which Format is Preferable?

Opinions differ, but there are certainly various online resources that prove that Protobuf performs more quickly than JSON, XML and others. However, each use case varies so it is important to try both out for your specific needs.

Auth0 Test – Auth0 ran a test of a few different scenarios to see how JSON and Protobuf performed against one another using a Spring Boot application. The two scenarios that Auth0 created were Java to Java and JavaScript to Java – in order to measure “how this protocol would behave in an enterprise environment like Java and also on an environment where JSON is the native message format. That is, what I show here is data from an environment where JSON is built in and should perform extremely fast (JavaScript engines) and from an environment where JSON is not a first class citizen”. The short answer is that Protobuf was proven to be faster than JSON in both these instances.

DZone Test – Tao Wen on DZone set out to do his own test. He starts by pointing out that if you are switching over from JSON to Protobuf just for the speed, then the performance should be at least twice as fast, otherwise “it is not worth the effort”. He cites a study from DSL Platform, which proves that in terms of Java, JSON actually performs better. Wen says the reason to use Protobuf should not be speed, but rather “the awesome cross-language schema definition for data exchange”.

Code Climate Test – Michael Bernstein ran an excellent comparison on Code Climate. Bernstein begins by describing JSON’s advantages as a data interchange format – “it is human readable, well understood, and typically performs well”. He goes on to say that he believes Protocol Buffers are a better choice than JSON for encoding data.

He cites five primary reasons for believing this:

(i) Schemas Are Awesome – by encoding the semantics of your business objects once in proto format, it helps “ensure that the signal doesn’t get lost between applications, and that the boundaries you create enforce your business rules”;

(ii) Backward Compatibility for Free – JSON doesn’t use number fields whereas Protocol Buffers does, which obviate the need for version checks and avoids the need for “ugly code”, making backward compatibility less of a challenge;

(iii) Less Boilerplate Code – Protocol Buffers allows you to evolve your proto generated classes along with your schema whereas JSON endpoints in HTTP based services tend to rely on hand-written boilerplate code to handle the encoding and decoding of Ruby objects to and from JSON;

(iv) Validations and Extensibility – The keywords in Protocol Buffers are powerful, allowing you to encode the shape of your data structure and how the classes will work in each language;

(v) Easy Language Interoperability – The variety of languages that are used to implement Protocol Buffers makes “interoperability between polyglot applications in your architecture that much simpler”.

Pros and Cons

To summarize, here is an overview of the pros and cons of each:

Protobuf is easier to bind to objects; comparing strings in JSON can be slow.
As JSON is textual, its integers and floats can be slow to encode and decode.
JSON is not designed for numbers.
JSON is useful when you want data to be human readable.
If your server side application is written in JavaScript, JSON makes most sense.
Parsing JSON strings, arrays and objects requires a sequential scan meaning there is no element size or count for the header of the body.
Parsing is sequential in the Protobuf library and in JSON. This means it is difficult to achieve a significant performance boost when running the formats in the same CPU and the same core.
The library implementation for Protobuf is probably not faster than JSON even though it appears to be a faster format overall. If the parser has not been configured to be well optimized, extra memory allocation or copy with make it slower.
JSON can be significantly slower in relation to JMH benchmarking.
The integer should be especially fast in Protobuf.
Protofbuf tends to be faster at integer encoding than JSON.
For double-decoding, Protobuf has been proven to be significantly faster than JSON for double decoding (JSON is unfit for float numbers).
Protocol Buffers offers various advantages to JSON for services that are directly consumed by a web browser.

Again, ultimately it is important to try out both and see which is superior for your particular use case/s.