Apache Flink: Stream Processing Will Conquer The Enterprise in 2018

Categories

According to a new survey from data Artisans interviewing Apache Flink users, the majority of surveyed businesses are planning on deploying more applications powered by Apache Flink software in the year ahead.

Apache Flink is an open-source stream processing framework for “distributed, high-performing, always-available, and accurate” data streaming applications.

According to data Artisans, “Stream processing is the processing of data in motion, or in other words, computing on data directly as it is produced or received.” Most data is born as continuous streams: user activity on a website, financial trades, etc. – this data is created as a series of events over time.

Before the advent of stream processing, this kind of data was stored in some kind of mass storage area. Applications would query the data or compute across the data as necessary.

Stream processing turns this paradigm around: “The application logic, analytics, and queries exist continuously, and data flows through them continuously.”

As soon as a stream processing application receives an event from the stream, it reacts to that event: this can trigger an action, update an aggregate or other statistic, or “remember” that event for forthcoming reference.

Furthermore, streaming computations can process multiple data streams jointly, and every computation over the event data stream can produce other event data streams.

The systems that receive and send the data streams and execute the application or analytics logic are known as stream processors. Their job is to ensure that data flows efficiently, the computation scales and that it is fault tolerant.

Companies from around the globe that manage high volumes of data that are using Apache Flink as its stream processing platform include Alibaba, ING, Netflix, SK Telecom, and Uber.

One quarter of respondents said they are processing at least 1 billion events per day, with 1 percent processing at least 1 trillion events per day. The volume of events is expected to massively grow as organizations deploy more live data applications.

217 IT leaders, software engineers, application developers, and data/systems architects from 28 countries were surveyed. The ability to react to data instantly without lag-time is emerging as a critical priority among businesses of all sizes, from small organizations earning under $1 million in annual sales (10 percent of respondents) to very large enterprises with over $1 billion in earnings (18 percent of respondents).

Of the new application types developers are building or planning to build powered by Apache Flink, the most popular are: machine learning (64 percent) both for model scoring (34 percent) and model training (30 percent), anomaly detection/system monitoring (27 percent) and business intelligence/reporting (25 percent), followed by recommendation/decisioning engines (22 percent) and security/fraud detection (19 percent).

“This year’s survey presents clear evidence that stream processing is becoming widely adopted across enterprises of all sizes and in a variety of industries outside of technology, with financial services, insurance, real estate and telecommunications leading the pack,” said Kostas Tzoumas, co-founder and CEO of data Artisans and a PMC member of Apache Flink. He added, “The market is expected to reach upwards of $13 billion USD by 2021, and we’re seeing a range of new applications being put into production, including machine learning, security and fraud detection, systems monitoring and Internet of Things.”

data Artisans reported the benefits companies are seeing with Apache Flink, and noted that 92 percent of respondents had expressed satisfaction with the service, with 58 percent of those very or completely satisfied. The four top areas of satisfaction were: (i) event time handling; (ii) DataStream API (stream processing); (iii) throughput and latency; and (iv) windowing and watermarks.

Of those who are intending to deploy more Flink applications in 2018:

  • 62 percent say they will deploy one to five more applications
  • 11 percent say six to 10 more applications
  • 8 percent say 10+ applications
  • 7 percent expect to deploy 20+ additional applications in 2018

Most of the surveyed respondents (70 percent) said their team or department will be hiring in 2018. 59 percent expect their team or departmental budget to increase.

This was the second-ever Apache Flink user survey, designed to “ better understand Flink usage in the community, asking for feedback about both common patterns and the most-needed Flink features”.

Scroll to Top