ElasticSearch Overview

Categories

Elastic is an Amsterdam-based company that has experienced rapid growth in recent years and has received $104MM+ in funding). Its ElasticSearch search engine (the company’s namesake) allows businesses to store, search, and analyze data in real-time and rapidly glean actionable insights and results from complex queries.

Big Data is king and ElasticSearch’s allure is that it promises control and visibility over it and access to all its attendant performance enhancements. The free, open-source search engine has been downloaded more than 75 million times and counts Netflix, Google, Wikipedia, and LinkedIn among its adopters.

ElasticSearch is a distributed document store and search engine that is built on Apache Lucene, a Java-based full-text search engine. When requests are sent in languages other than Java, they communicate with the engine using RESTful API over HTTP. It also supports multiple search parameters, including multilingual search, geolocation, contextual suggestions, autocomplete and result snippets.

Crucially, ElasticSearch is interoperable with the suite of open-source software products and proprietary extensions that comprise Elastic Stack 5.0, creating a single point of accountability for enterprises and streamlining the log analysis process. We’ll break down the other components of Elastic Stack 5.0 in following posts, whereas here we’ll primarily focus on the characteristics of the search engine anchoring the entire stack.

Breakdown of ElasticSearch

ElasticSearch automatically indexes data as JSON documents, which are full-text searchable, and retrieves them in near real-time. Related JSON documents are grouped into an index. Each index is stored in at least one primary shard and zero or more replica shards (as a redundancy to prevent data loss). All shards are instances of Lucene and are stored in data nodes. You can also calibrate how many primary shards an index is assigned to, although the default number is five. Groups of data nodes form a cluster.

The data nodes within a cluster perform functions related to indexing searching and aggregating data. Out of these nodes, a master node is elected to coordinate cluster-level tasks such as distributing shards across data nodes as well as creating and deleting indices. For instance, the master node ensures that replica shards are not assigned to the same node as their primaries to prevent data loss. In the event of a master node failure, master-eligible data nodes elect a new master.

As its name suggests, ElasticSearch is also flexible and easy to scale horizontally. You can scale by simply adding new clusters to distribute the load. The engine also supports multitenancy meaning that when users create a new index with all of their documents, they can add it to an existing cluster. Users can then search within their own individual index or across all indices for all users.

Inverted indexing is integral to ElasticSearch’s full-text search capabilities. Whereas traditional indices are contained within documents, inverted indexing does the opposite. Each time a document is indexed, ElasticSearch automatically creates an inverted index that maps terms to the documents that contain them and stores statistics about the terms to make term-based searches more rapid and efficient.

All in all, key features of ElasticSearch are that it supports multitenancy, enables near real-time data retrieval, automates the data structuring process, and is elastic, horizontally scalable, and reliable given its built-in fail safes.

 

 

Scroll to Top