In-Motion Reference Data Enrichment

Enrichment of streaming events with external data is supported through the use of a low-latency data caching platform


Enriching event data with the latest reference data is crucial for processing real-time business insights. However, as reference data typically resides in a data mart or warehouse and changes slowly, integrating it into a high-throughput streaming platform can lead to processing latency challenges. This is often attributed to the I/O overhead induced by the network and storage layer.

Joule offers a solution to eliminate this latency by providing an embedded cache within the process. Upon Joule startup, developers can define the necessary datasets and customise the caching infrastructure's behaviour. This enterprise-level production approach supports advanced use cases, delivering real-time insights and triggering business decisions with agility.

Architecture

The Joule implementation architecture binds a Geode client cache within the process. This provides live data to be provided by a Managed Service. The service would provide the necessary upsert hooks from the database to the distributed cache when in turn would update the client cache using operational functions (i.e invalidate, expire, GII, cache miss etc;)


In-Motion Enrichment DSL

The enricher processor provide users to the ability to enrich an event with multiple data elements through the use of enhanced mapping.

Example

enricher:
  fields:      
    deviceManufacturer:
      by key: tac
      with values: [deviceManufacturer, year_released]
      using: deviceStore

    modelDetails:
      by key: tac
      as object: true   
      using: deviceStore

    contractedDataBundle:
      by query:  "select * from /userBundle where imsi = ?"
      query fields: [imsi]
      all attributes: true
      using: dataBundleStore

  stores:
    deviceStore:
      store name: mobiledevices

    dataBundleStore:
      store name: mobilecontracts

Key based enrichment

Using the key based look up approach requires a target key value store to be defined for the criteria. Out of the box Joule provides a Apache Geode connector which uses an embedded client cache.

Provide return reference data as a linked map of key value pairs

deviceManufacturer:
    by key: tac
    with values: [deviceManufacturer, year_released]
    using: deviceStore

Provide return reference data as a linked object

modelDetails:
  by key: tac
  as object: true   
  using: deviceStore

Attributes

AttributeDescriptionData Type

by key

This uses the value within the passed event as a lookup key on the linked store

String

Query based enrichment

Using the key based look up approach requires a target key value store to be defined for the criteria. Out of the box Joule provides a Apache Geode connector which uses an embedded client cache.

contractedDataBundle:
    by query:  "select * from /userBundle where imsi = ?"
    query fields: [imsi]
    all attributes: true
    using: dataBundleStore

OQL Note

Joule uses the Apache Geode platform for In-motion data solution. This is an enterprise grade approach that is well understand and fully battle tested.

Geode uses a query syntax based on OQL (Object Query Language) to query region data. OQL and SQL have many syntactical similarities, however they have significant differences. For example, while OQL does not offer all of the capabilities of SQL like aggregates, OQL does allow you to execute queries on complex object graphs, query object attributes and invoke object methods.

See Apache Geode OQL documentation for further details.

Last updated