Key concepts
Enriching events with the latest contextual data is crucial for processing advanced real-time business insights
Last updated
Enriching events with the latest contextual data is crucial for processing advanced real-time business insights
Last updated
This article introduces high-level concepts about the enrichment processor, providing a foundational understanding of key topics and how they are used.
We will learn how Joule enables the linkage of contextual data to stream events within a processing pipelines that subscribe to source data and publish results.
This article covers:
Enrichment architecture An overview of enrichment architecture components and how they work together.
Supported data stores List of supported enrichment data stores.
Worked example Learn how to apply enrichment within a stream processing pipeline.
Key concerns Selecting the optimal data store for read heavy enrichment is a key concern to get right.
Out-of-the-box Joule provides a low-latency event enrichment solution for advanced use cases.
This core feature enables the user to focus on composing data requirements using a pragmatic enrichment query DSL.
The Joule contextual data architecture supports various low-latency in-memory data stores that reduce the inherent I/O overhead of out-of-the-box process databases and thus enable stream based enrichment.
Note contextual data refers
is a generic term to refer to any associative data or calculated data point that can provide additional context to an event.
The architecture components work together to provide the necessary functions for a high-performance enrichment process.
User enrichment DSL Enables the user to define a field level enrichment query, response type and the data source using a flexible syntax.
Contextual data interface Provides the standardised access abstraction over the underlying data sources
In-memory data stores Access to the internal high-performance in-memory data store and also a reference implementation of a proven enterprise data cluster solution.
S3 data access Standardised S3 data access for cross cloud collaboration
These are the supported out-of-the-box enrichment stores Joule provides:
Joule DB
S3
Distributed caching cluster
The banking example demonstrates how to enrich events with both contextual data from an imported data set and computed metrics. Both forms use the internal in-memory database.
See Banking example documentation for a complete worked solution.
Enriching events is an expense read operation and it becomes apparent during high frequency event processing. If the enrichment processor is not configured optimally at worst Joule will fail to produce results due to heavy read operations.
For example, when Joule starts to emit a slow stream of events versus a high number of incoming events with minimal complex processing, there are generally repeatable processing indicators that can be observed.
Slow stream of output events:
Output events are produced at a significantly lower rate.
Typically indicates high read operations occurring due to constant database queries in the enrichment processor.
Joule process failure:
Large volume of events arriving continuously causes available memory to be consumed.
This results with the JVM to fail with an OutOfMemoryError and will exit.
If you are using the internal in-memory database the following approaches are the first steps towards a better performing solutions
Ensure you are using in-memory database only and set joule.db.memoryOnly=true
, see system properties.
Set a unique index on contextual tables
Reduce the number of events to be enriched by applying filtering earlier in the process.
Perform enrichment after a window based analytic process.
Scale Joule to leverage Kafka topic partitioning.
If the above approach does not resolve your issue, there is still a reliable alternative: embedding a caching solution, such as a key-value store, directly within the process.
Joule provides such a solution in the form of Apache Geode. This is an enterprise grade solution which has been battle test over many years and in many demanding high frequency and low latency environments.
Further information on how to use this technology can be found here.