In-Motion Reference Data Enrichment
Enrichment of streaming events with external data is supported through the use of a low-latency data caching platform
Enriching event data with the latest reference data is crucial for processing real-time business insights. However, as reference data typically resides in a data mart or warehouse and changes slowly, integrating it into a high-throughput streaming platform can lead to processing latency challenges. This is often attributed to the I/O overhead induced by the network and storage layer.
Joule offers a solution to eliminate this latency by providing an embedded cache within the process. Upon Joule startup, developers can define the necessary datasets and customise the caching infrastructure's behaviour. This enterprise-level production approach supports advanced use cases, delivering real-time insights and triggering business decisions with agility.
Architecture
The Joule implementation architecture binds a Geode client cache within the process. This provides live data to be provided by a Managed Service. The service would provide the necessary upsert hooks from the database to the distributed cache when in turn would update the client cache using operational functions (i.e invalidate, expire, GII, cache miss etc;)
In-Motion Enrichment DSL
The enricher processor provide users to the ability to enrich an event with multiple data elements through the use of enhanced mapping.
Example
Key based enrichment
Using the key based look up approach requires a target key value store to be defined for the criteria. Out of the box Joule provides a Apache Geode connector which uses an embedded client cache.
Provide return reference data as a linked map of key value pairs
Provide return reference data as a linked object
Attributes
Attribute | Description | Data Type |
---|---|---|
by key | This uses the value within the passed event as a lookup key on the linked store | String |
Query based enrichment
Using the key based look up approach requires a target key value store to be defined for the criteria. Out of the box Joule provides a Apache Geode connector which uses an embedded client cache.
OQL Note
Joule uses the Apache Geode platform for In-motion data solution. This is an enterprise grade approach that is well understand and fully battle tested.
Geode uses a query syntax based on OQL (Object Query Language) to query region data. OQL and SQL have many syntactical similarities, however they have significant differences. For example, while OQL does not offer all of the capabilities of SQL like aggregates, OQL does allow you to execute queries on complex object graphs, query object attributes and invoke object methods.
See Apache Geode OQL documentation for further details.
Last updated