Key concepts

Enriching events with the latest contextual data is crucial for processing advanced real-time business insights

What will we learn on this article?

This article introduces high-level concepts about the enrichment processor, providing a foundational understanding of key topics and how they are used.

We will learn how Joule enables the linkage of contextual data to stream events within a processing pipelines that subscribe to source data and publish results.

This article covers:

Enrichment architecture An overview of enrichment architecture components and how they work together.
Supported data stores List of supported enrichment data stores.
Worked example Learn how to apply enrichment within a stream processing pipeline.
Key concerns Selecting the optimal data store for read heavy enrichment is a key concern to get right.

Enrichment architecture

Out-of-the-box Joule provides a low-latency event enrichment solution for advanced use cases.

This core feature enables the user to focus on composing data requirements using a pragmatic enrichment query DSL.

The Joule contextual data architecture supports various low-latency in-memory data stores that reduce the inherent I/O overhead of out-of-the-box process databases and thus enable stream based enrichment.

Note contextual data refers is a generic term to refer to any associative data or calculated data point that can provide additional context to an event.

Architecture components

The architecture components work together to provide the necessary functions for a high-performance enrichment process.

User enrichment DSL Enables the user to define a field level enrichment query, response type and the data source using a flexible syntax.
Contextual data interface Provides the standardised access abstraction over the underlying data sources
In-memory data stores Access to the internal high-performance in-memory data store and also a reference implementation of a proven enterprise data cluster solution.
S3 data access Standardised S3 data access for cross cloud collaboration

Supported data stores

These are the supported out-of-the-box enrichment stores Joule provides:

Metrics engine
Joule DB
S3
Distributed caching cluster

Worked example

The banking example demonstrates how to enrich events with both contextual data from an imported data set and computed metrics. Both forms use the internal in-memory database.

See Banking example documentation for a complete worked solution.

Code snippet

enricher:
  fields:
    company_info:
      by query: "select * from reference_data.nasdaq_companies where Symbol = ?"
      query fields: [ symbol ]
      with values: [ Name,Country ]
      using: JouleDB
      
    quote_metrics:
      by metric family: BidMovingAverage
      by key: symbol
      with values: [avg_bid_min, avg_bid_avg, avg_bid_max]
      using: MetricsDB

Key concerns

Enriching events is an expense read operation and it becomes apparent during high frequency event processing. If the enrichment processor is not configured optimally at worst Joule will fail to produce results due to heavy read operations.

Underperformance characteristics

For example, when Joule starts to emit a slow stream of events versus a high number of incoming events with minimal complex processing, there are generally repeatable processing indicators that can be observed.

Slow stream of output events:
- Output events are produced at a significantly lower rate.
- Typically indicates high read operations occurring due to constant database queries in the enrichment processor.
Joule process failure:
- Large volume of events arriving continuously causes available memory to be consumed.
- This results with the JVM to fail with an OutOfMemoryError and will exit.

First steps to consider

If you are using the internal in-memory database the following approaches are the first steps towards a better performing solutions

Ensure you are using in-memory database only and set joule.db.memoryOnly=true , see system properties.
Set a unique index on contextual tables
Reduce the number of events to be enriched by applying filtering earlier in the process.
Perform enrichment after a window based analytic process.
Scale Joule to leverage Kafka topic partitioning.

Switch to a Key-Value store

If the above approach does not resolve your issue, there is still a reliable alternative: embedding a caching solution, such as a key-value store, directly within the process.

Joule provides such a solution in the form of Apache Geode. This is an enterprise grade solution which has been battle test over many years and in many demanding high frequency and low latency environments.

Further information on how to use this technology can be found here.

PreviousEnrichment NextAnatomy of enrichment DSL

Last updated 8 months ago

Was this helpful?