Joule
  • Welcome to Joule's Docs
  • Why Joule?
    • Joule capabilities
  • What is Joule?
    • Key features
    • The tech stack
  • Use case enablement
    • Use case building framework
  • Concepts
    • Core concepts
    • Low code development
    • Unified execution engine
    • Batch and stream processing
    • Continuous metrics
    • Key Joule data types
      • StreamEvent object
      • Contextual data
      • GeoNode
  • Tutorials
    • Getting started
    • Build your first use case
    • Stream sliding window quote analytics
    • Advanced tutorials
      • Custom missing value processor
      • Stateless Bollinger band analytics
      • IoT device control
  • FAQ
  • Glossary
  • Components
    • Pipelines
      • Use case anatomy
      • Data priming
        • Types of import
      • Processing unit
      • Group by
      • Emit computed events
      • Telemetry auditing
    • Processors
      • Common attributes
      • Filters
        • By type
        • By expression
        • Send on delta
        • Remove attributes
        • Drop all events
      • Enrichment
        • Key concepts
          • Anatomy of enrichment DSL
          • Banking example
        • Metrics
        • Dynamic contextual data
          • Caching architecture
        • Static contextual data
      • Transformation
        • Field Tokeniser
        • Obfuscation
          • Encryption
          • Masking
          • Bucketing
          • Redaction
      • Triggers
        • Change Data Capture
        • Business rules
      • Stream join
        • Inner stream joins
        • Outer stream joins
        • Join attributes & policy
      • Event tap
        • Anatomy of a Tap
        • SQL Queries
    • Analytics
      • Analytic tools
        • User defined analytics
          • Streaming analytics example
          • User defined analytics
          • User defined scripts
          • User defined functions
            • Average function library
        • Window analytics
          • Tumbling window
          • Sliding window
          • Aggregate functions
        • Analytic functions
          • Stateful
            • Exponential moving average
            • Rolling Sum
          • Stateless
            • Normalisation
              • Absolute max
              • Min max
              • Standardisation
              • Mean
              • Log
              • Z-Score
            • Scaling
              • Unit scale
              • Robust Scale
            • Statistics
              • Statistic summaries
              • Weighted moving average
              • Simple moving average
              • Count
            • General
              • Euclidean
        • Advanced analytics
          • Geospatial
            • Entity geo tracker
            • Geofence occupancy trigger
            • Geo search
            • IP address resolver
            • Reverse geocoding
            • Spatial Index
          • HyperLogLog
          • Distinct counter
      • ML inferencing
        • Feature engineering
          • Scripting
          • Scaling
          • Transform
        • Online predictive analytics
        • Model audit
        • Model management
      • Metrics engine
        • Create metrics
        • Apply metrics
        • Manage metrics
        • Priming metrics
    • Contextual data
      • Architecture
      • Configuration
      • MinIO S3
      • Apache Geode
    • Connectors
      • Sources
        • Kafka
          • Ingestion
        • RabbitMQ
          • Further RabbitMQ configurations
        • MQTT
          • Topic wildcards
          • Session management
          • Last Will and Testament
        • Rest endpoints
        • MinIO S3
        • File watcher
      • Sinks
        • Kafka
        • RabbitMQ
          • Further configurations
        • MQTT
          • Persistent messaging
          • Last Will and Testament
        • SQL databases
        • InfluxDB
        • MongoDB
        • Geode
        • WebSocket endpoint
        • MinIO S3
        • File transport
        • Slack
        • Email
      • Serialisers
        • Serialisation
          • Custom transform example
          • Formatters
        • Deserialisers
          • Custom parsing example
    • Observability
      • Enabling JMX for Joule
      • Meters
      • Metrics API
  • DEVELOPER GUIDES
    • Setting up developer environment
      • Environment setup
      • Build and deploy
      • Install Joule
        • Install Docker demo environment
        • Install with Docker
        • Install from source
        • Install Joule examples
    • Joulectl CLI
    • API Endpoints
      • Mangement API
        • Use case
        • Pipelines
        • Data connectors
        • Contextual data
      • Data access API
        • Query
        • Upload
        • WebSocket
      • SQL support
    • Builder SDK
      • Connector API
        • Sources
          • StreamEventParser API
        • Sinks
          • CustomTransformer API
      • Processor API
      • Analytics API
        • Create custom metrics
        • Define analytics
        • Windows API
        • SQL queries
      • Transformation API
        • Obfuscation API
        • FieldTokenizer API
      • File processing
      • Data types
        • StreamEvent
        • ReferenceDataObject
        • GeoNode
    • System configuration
      • System properties
  • Deployment strategies
    • Deployment Overview
    • Single Node
    • Cluster
    • GuardianDB
    • Packaging
      • Containers
      • Bare metal
  • Product updates
    • Public Roadmap
    • Release Notes
      • v1.2.0 Join Streams with stateful analytics
      • v1.1.0 Streaming analytics enhancements
      • v1.0.4 Predictive stream processing
      • v1.0.3 Contextual SQL based metrics
    • Change history
Powered by GitBook
On this page
  • What will we learn on this article?
  • Enrichment architecture
  • Architecture components
  • Supported data stores
  • Worked example
  • Key concerns
  • Underperformance characteristics
  • First steps to consider
  • Switch to a Key-Value store

Was this helpful?

  1. Components
  2. Processors
  3. Enrichment

Key concepts

Enriching events with the latest contextual data is crucial for processing advanced real-time business insights

PreviousEnrichmentNextAnatomy of enrichment DSL

Last updated 5 months ago

Was this helpful?

What will we learn on this article?

This article introduces high-level concepts about the enrichment processor, providing a foundational understanding of key topics and how they are used.

We will learn how Joule enables the linkage of contextual data to stream events within a processing pipelines that subscribe to source data and publish results.

This article covers:

  1. Enrichment architecture An overview of enrichment architecture components and how they work together.

  2. Supported data stores List of supported enrichment data stores.

  3. Worked example Learn how to apply enrichment within a stream processing pipeline.

  4. Key concerns Selecting the optimal data store for read heavy enrichment is a key concern to get right.

Enrichment architecture

Out-of-the-box Joule provides a low-latency event enrichment solution for advanced use cases.

This core feature enables the user to focus on composing data requirements using a pragmatic enrichment query DSL.

The Joule contextual data architecture supports various low-latency in-memory data stores that reduce the inherent I/O overhead of out-of-the-box process databases and thus enable stream based enrichment.

Note contextual data refers is a generic term to refer to any associative data or calculated data point that can provide additional context to an event.

Architecture components

The architecture components work together to provide the necessary functions for a high-performance enrichment process.

  • User enrichment DSL Enables the user to define a field level enrichment query, response type and the data source using a flexible syntax.

  • Contextual data interface Provides the standardised access abstraction over the underlying data sources

  • In-memory data stores Access to the internal high-performance in-memory data store and also a reference implementation of a proven enterprise data cluster solution.

  • S3 data access Standardised S3 data access for cross cloud collaboration

Supported data stores

These are the supported out-of-the-box enrichment stores Joule provides:

  • Joule DB

  • S3

  • Distributed caching cluster

Worked example

The banking example demonstrates how to enrich events with both contextual data from an imported data set and computed metrics. Both forms use the internal in-memory database.

Code snippet

enricher:
  fields:
    company_info:
      by query: "select * from reference_data.nasdaq_companies where Symbol = ?"
      query fields: [ symbol ]
      with values: [ Name,Country ]
      using: JouleDB
      
    quote_metrics:
      by metric family: BidMovingAverage
      by key: symbol
      with values: [avg_bid_min, avg_bid_avg, avg_bid_max]
      using: MetricsDB

Key concerns

Enriching events is an expense read operation and it becomes apparent during high frequency event processing. If the enrichment processor is not configured optimally at worst Joule will fail to produce results due to heavy read operations.

Underperformance characteristics

For example, when Joule starts to emit a slow stream of events versus a high number of incoming events with minimal complex processing, there are generally repeatable processing indicators that can be observed.

  1. Slow stream of output events:

    • Output events are produced at a significantly lower rate.

    • Typically indicates high read operations occurring due to constant database queries in the enrichment processor.

  2. Joule process failure:

    • Large volume of events arriving continuously causes available memory to be consumed.

    • This results with the JVM to fail with an OutOfMemoryError and will exit.

First steps to consider

If you are using the internal in-memory database the following approaches are the first steps towards a better performing solutions

  • Set a unique index on contextual tables

  • Reduce the number of events to be enriched by applying filtering earlier in the process.

  • Perform enrichment after a window based analytic process.

  • Scale Joule to leverage Kafka topic partitioning.

Switch to a Key-Value store

If the above approach does not resolve your issue, there is still a reliable alternative: embedding a caching solution, such as a key-value store, directly within the process.

See documentation for a complete worked solution.

Ensure you are using in-memory database only and set joule.db.memoryOnly=true , see .

Joule provides such a solution in the form of . This is an enterprise grade solution which has been battle test over many years and in many demanding high frequency and low latency environments.

Further information on how to use this technology can be found .

Metrics engine
Banking example
system properties
Apache Geode
here
Contextual data interface