Joule
  • Welcome to Joule's Docs
  • Why Joule?
    • Joule capabilities
  • What is Joule?
    • Key features
    • The tech stack
  • Use case enablement
    • Use case building framework
  • Concepts
    • Core concepts
    • Low code development
    • Unified execution engine
    • Batch and stream processing
    • Continuous metrics
    • Key Joule data types
      • StreamEvent object
      • Contextual data
      • GeoNode
  • Tutorials
    • Getting started
    • Build your first use case
    • Stream sliding window quote analytics
    • Advanced tutorials
      • Custom missing value processor
      • Stateless Bollinger band analytics
      • IoT device control
  • FAQ
  • Glossary
  • Components
    • Pipelines
      • Use case anatomy
      • Data priming
        • Types of import
      • Processing unit
      • Group by
      • Emit computed events
      • Telemetry auditing
    • Processors
      • Common attributes
      • Filters
        • By type
        • By expression
        • Send on delta
        • Remove attributes
        • Drop all events
      • Enrichment
        • Key concepts
          • Anatomy of enrichment DSL
          • Banking example
        • Metrics
        • Dynamic contextual data
          • Caching architecture
        • Static contextual data
      • Transformation
        • Field Tokeniser
        • Obfuscation
          • Encryption
          • Masking
          • Bucketing
          • Redaction
      • Triggers
        • Change Data Capture
        • Business rules
      • Stream join
        • Inner stream joins
        • Outer stream joins
        • Join attributes & policy
      • Event tap
        • Anatomy of a Tap
        • SQL Queries
    • Analytics
      • Analytic tools
        • User defined analytics
          • Streaming analytics example
          • User defined analytics
          • User defined scripts
          • User defined functions
            • Average function library
        • Window analytics
          • Tumbling window
          • Sliding window
          • Aggregate functions
        • Analytic functions
          • Stateful
            • Exponential moving average
            • Rolling Sum
          • Stateless
            • Normalisation
              • Absolute max
              • Min max
              • Standardisation
              • Mean
              • Log
              • Z-Score
            • Scaling
              • Unit scale
              • Robust Scale
            • Statistics
              • Statistic summaries
              • Weighted moving average
              • Simple moving average
              • Count
            • General
              • Euclidean
        • Advanced analytics
          • Geospatial
            • Entity geo tracker
            • Geofence occupancy trigger
            • Geo search
            • IP address resolver
            • Reverse geocoding
            • Spatial Index
          • HyperLogLog
          • Distinct counter
      • ML inferencing
        • Feature engineering
          • Scripting
          • Scaling
          • Transform
        • Online predictive analytics
        • Model audit
        • Model management
      • Metrics engine
        • Create metrics
        • Apply metrics
        • Manage metrics
        • Priming metrics
    • Contextual data
      • Architecture
      • Configuration
      • MinIO S3
      • Apache Geode
    • Connectors
      • Sources
        • Kafka
          • Ingestion
        • RabbitMQ
          • Further RabbitMQ configurations
        • MQTT
          • Topic wildcards
          • Session management
          • Last Will and Testament
        • Rest endpoints
        • MinIO S3
        • File watcher
      • Sinks
        • Kafka
        • RabbitMQ
          • Further configurations
        • MQTT
          • Persistent messaging
          • Last Will and Testament
        • SQL databases
        • InfluxDB
        • MongoDB
        • Geode
        • WebSocket endpoint
        • MinIO S3
        • File transport
        • Slack
        • Email
      • Serialisers
        • Serialisation
          • Custom transform example
          • Formatters
        • Deserialisers
          • Custom parsing example
    • Observability
      • Enabling JMX for Joule
      • Meters
      • Metrics API
  • DEVELOPER GUIDES
    • Setting up developer environment
      • Environment setup
      • Build and deploy
      • Install Joule
        • Install Docker demo environment
        • Install with Docker
        • Install from source
        • Install Joule examples
    • Joulectl CLI
    • API Endpoints
      • Mangement API
        • Use case
        • Pipelines
        • Data connectors
        • Contextual data
      • Data access API
        • Query
        • Upload
        • WebSocket
      • SQL support
    • Builder SDK
      • Connector API
        • Sources
          • StreamEventParser API
        • Sinks
          • CustomTransformer API
      • Processor API
      • Analytics API
        • Create custom metrics
        • Define analytics
        • Windows API
        • SQL queries
      • Transformation API
        • Obfuscation API
        • FieldTokenizer API
      • File processing
      • Data types
        • StreamEvent
        • ReferenceDataObject
        • GeoNode
    • System configuration
      • System properties
  • Deployment strategies
    • Deployment Overview
    • Single Node
    • Cluster
    • GuardianDB
    • Packaging
      • Containers
      • Bare metal
  • Product updates
    • Public Roadmap
    • Release Notes
      • v1.2.0 Join Streams with stateful analytics
      • v1.1.0 Streaming analytics enhancements
      • v1.0.4 Predictive stream processing
      • v1.0.3 Contextual SQL based metrics
    • Change history
Powered by GitBook
On this page
  • What will we learn in this article?
  • What is a unified execution engine?
  • Adapting to continuous data with a flexible unified model
  • Understanding data processing with Joule
  • Batch processing
  • Stream processing
  • How does this work in Joule?
  • Unified engine architecture

Was this helpful?

  1. Concepts

Unified execution engine

Unified engine for real-time and batch data processing

PreviousLow code developmentNextBatch and stream processing

Last updated 5 months ago

Was this helpful?

What will we learn in this article?

This article explores the unified execution engine in data processing, covering how it addresses challenges in real-time and batch data processing within a single, integrated execution engine that is Joule.

We will gain insight into Joule’s architecture and how it manages continuous and periodic data without distinction between batch and streaming.

Joule solves how a unified approach overcomes issues with traditional data processing frameworks by eliminating the requirement for data to be complete at time of ingestion by introducing components like , , , and .

What is a unified execution engine?

A unified execution engine processes both unbounded (continuous) and bounded (finite) data without requiring developers to differentiate between them. Because we will not cover what unbounded and bounded data is, of the concepts.

Ideally we would like to have insights as soon as the initial event has occurred. However, this is entirely dependent upon the frequency the data is presented to the processing engine and the actual use case needs.

Low latency event feeds generally produce a faster time to insight whereas bounded data deliver snapshots that reflect point in time view. Therefore when processing both types of data we need to balance the expectation of an ideal state versus the actual requirements of reality.

In an ideal world

Processing = event creation

Processing would happen instantly as events are created.

In reality

Processing ≠ event creation

Event processing must wait for new events to enter the pipeline before it can start generating actions to act upon.

Joule does not differentiate between bounded and unbounded data

Adapting to continuous data with a flexible unified model

Unlike traditional data processing frameworks, which assume data will eventually become complete; a unified model operates on the assumption that new data may always arrive.

This approach enables flexibility by not tying data infrastructure to specific execution engines and by providing consistency across both unbounded and bounded datasets.

Joule's processors would then allow developers to specify when to emit the output results for a given period of time, enabling responsive processing even in continuous workflows.

i.e., with Joule, you can spin up multiple processors, each with its own scheduler, to run independently within the same environment. Unlike traditional setups where a single scheduler manages all tasks, Joule enables separate, decoupled processing for different use cases.

Understanding data processing with Joule

Joule does not differentiate between batch or stream processing |For a more in-depth description of how Joule treats data processing please followBatch and stream processing.

Batch processing

Batch processing in a unified engine handles large, finite datasets processed periodically. Joule manages batch data by applying a micro batching method which appears as a stream of events internally.

With unified execution, batch jobs can also include real-time triggers, allowing them to operate seamlessly alongside streaming data.

Stream processing

This approach allows streaming data to flow without waiting for all data to arrive, overcoming delays caused by continuous event inflow.

How does this work in Joule?

Joule treats batch and streaming data consistently. Allowing seamless processing across different data sources and formats.

Joule’s architecture supports modular pipelines, real-time observability and extensibility. This makes Joule adaptable to diverse data processing demands.

Unified engine architecture

Joule uses the latest processing techniques to enable large file handling while managing memory efficiently.

Stream processing operates on unbounded datasets, offering near-real-time analytics as new data continuously arrives. Joule uses as an internal data storage, enabling high-performance analytics.

stream joins
filters
enrichments
windows
transformation
this article gives an overview
Apache Arrow
DuckDB

Processors

Processors are the core of the Joule platform, each performing a specific task. These create use case when linked together

Analytics

Analytics form the core platform feature that facilitates insight to value

Connectors

Integrate to external systems to consume events and publish insights

Unified execution engine being Joule