In the connected world of today’s digital economy, apps, IoT devices, vehicles, appliances and servers are generating endless stream of event data. Event data is typically in small size per event but streams in at high speed generating huge volumes over time. The stream of events describes what is happening over time and offers the opportunity to track and analyze things as they happen. With access to complete sets of event data, businesses can derive critical insights from comprehensive analysis on all the data.
Event Driven Applications are Driving Modern Workloads
In the fast race of digital transformation, enterprises are being challenged to manage and analyze the exploding volumes of event data. To harness event data for timely actions, enterprises must develop and deploy increasing number of event driven applications. These new types of event driven, modern workloads impose unique set of challenges for analytic systems.
- High speed data
The data streams in at high velocity with millions of events per second from growing number of users, online applications and mobile devices. The high-speed data demands fast ingest capabilities.
- High volume data
The streaming data accumulates to 100’s of GBs to peta bytes over time. The huge volumes of data require multi-tier storage that’s economical and offer easy access for any processing application.
- Open access to data
Organizations are leveraging multiple sources of data for advanced analytics. Thus, there will be a plethora of applications requiring access to the event data. Open and shared access to the massive datasets without requiring data duplication is a critical necessity.
- High performance, low latency, real-time analytics
The value of event data is in deriving real-time insights for timely actions. Real-time analytics requires immediate access to just arrived data without any latencies. Comprehensive analysis will require access to a mix of both historical and most recent data.
To manage and leverage event data, enterprises require a well-integrated system that can simplify the complexity associated with ingesting, persisting and analyzing event data at scale. IBM EventStore offers these capabilities in a single system.
IBM EventStore for Event Driven Applications
IBM EventStore is purpose-built for modern event-driven workloads. The following architectural principles of IBM EventStore help enterprises address the challenges associated with event data management.
- Distributed architecture offers scale
The EventStore is implemented as a distributed cluster of nodes with each node hosting the core engine and spark executors. The data is replicated across nodes for HA – adding nodes to the cluster scales CPU, storage resources and supports higher ingestion rates. The cluster ingests 1 million inserts per second per node.
- In-memory optimizations deliver performance
The ingested data is first persisted to in-memory logs and an in-memory index is maintained to enable fast lookups.
- Mixed storage architecture enables analytics on both recent and historical data
EventStore offers a single system that gives access to both recent and historical data. The ingested data is initially persisted to memory and synchronously written to query-able logs on SSDs that are locally attached to nodes. Over regular intervals, the data in log is groomed and written as parquet data blocks to highly available distributed storage for long-term persistence. The logs act as an operational data store and the distributed storage is used as long-term persistent store. Data in both stores can be queried for low-latency real-time analytics.
- Open data format enables processing from any application
The ingested data persisted to operational storage is groomed into Parquet data blocks and written into persistent storage. The parquet-based open data format enables any application to process the event data.
- Accelerated Spark analytics delivers high performance
The core EventStore engine is optimized to accelerate SparkSQL queries. It operates on compressed data and can perform complex joins. Once the OLAP query is parsed by the optimizer, the EventStore engine is engaged for queries that require access to most recent data and Spark executors are engaged for queries that require access to historical data.
- Docker and Kubernetes based setup eases deployment
IBM EventStore is built on Docker containers and Kubernetes orchestration engine. IBM’s Download and Go packaging offers single click install and setup experience to ease the deployment effort.
Event Data Management with IBM EventStore
IBM EventStore combined with IBM Data Science Experience, IBM Streams and IBM BigSQL enhances capabilities to manage and harness event data helping realize an Event Data Management System.
- IBM Data Science Experience offers an end-to-end machine learning platform to build, train and deploy machine learning models. Data scientists can use Notebooks for interactive, exploratory analysis of event data and build machine learning models.
- IBM Streams enables integration with an extensive range of event sources. The mediation helps enrich streaming data or drive in-flight scoring.
- IBM BigSQL offers the best SQL compliance and performance for queries on historical data.
The current build for developer preview combines EventStore with the IBM Data Science Experience. Later builds for advanced editions will include IBM Streams and IBM BigSQL.
This is part 1 of 3 in the blog series. This blog outlined an overview of IBM EventStore and how its capabilities will help enterprises address the complexity associated with harnessing exploding volumes of event data. Event driven applications span across a range of use cases in IoT, Web Analytics, Mobile Gaming, Billing, Fraud Detection and Utilities Monitoring. The next two parts in the blog series will focus on a use case of Click Stream analysis and outline the implementation details with IBM EventStore.