Joule
  • Welcome to Joule's Docs
  • Why Joule?
    • Joule capabilities
  • What is Joule?
    • Key features
    • The tech stack
  • Use case enablement
    • Use case building framework
  • Concepts
    • Core concepts
    • Low code development
    • Unified execution engine
    • Batch and stream processing
    • Continuous metrics
    • Key Joule data types
      • StreamEvent object
      • Contextual data
      • GeoNode
  • Tutorials
    • Getting started
    • Build your first use case
    • Stream sliding window quote analytics
    • Advanced tutorials
      • Custom missing value processor
      • Stateless Bollinger band analytics
      • IoT device control
  • FAQ
  • Glossary
  • Components
    • Pipelines
      • Use case anatomy
      • Data priming
        • Types of import
      • Processing unit
      • Group by
      • Emit computed events
      • Telemetry auditing
    • Processors
      • Common attributes
      • Filters
        • By type
        • By expression
        • Send on delta
        • Remove attributes
        • Drop all events
      • Enrichment
        • Key concepts
          • Anatomy of enrichment DSL
          • Banking example
        • Metrics
        • Dynamic contextual data
          • Caching architecture
        • Static contextual data
      • Transformation
        • Field Tokeniser
        • Obfuscation
          • Encryption
          • Masking
          • Bucketing
          • Redaction
      • Triggers
        • Change Data Capture
        • Business rules
      • Stream join
        • Inner stream joins
        • Outer stream joins
        • Join attributes & policy
      • Event tap
        • Anatomy of a Tap
        • SQL Queries
    • Analytics
      • Analytic tools
        • User defined analytics
          • Streaming analytics example
          • User defined analytics
          • User defined scripts
          • User defined functions
            • Average function library
        • Window analytics
          • Tumbling window
          • Sliding window
          • Aggregate functions
        • Analytic functions
          • Stateful
            • Exponential moving average
            • Rolling Sum
          • Stateless
            • Normalisation
              • Absolute max
              • Min max
              • Standardisation
              • Mean
              • Log
              • Z-Score
            • Scaling
              • Unit scale
              • Robust Scale
            • Statistics
              • Statistic summaries
              • Weighted moving average
              • Simple moving average
              • Count
            • General
              • Euclidean
        • Advanced analytics
          • Geospatial
            • Entity geo tracker
            • Geofence occupancy trigger
            • Geo search
            • IP address resolver
            • Reverse geocoding
            • Spatial Index
          • HyperLogLog
          • Distinct counter
      • ML inferencing
        • Feature engineering
          • Scripting
          • Scaling
          • Transform
        • Online predictive analytics
        • Model audit
        • Model management
      • Metrics engine
        • Create metrics
        • Apply metrics
        • Manage metrics
        • Priming metrics
    • Contextual data
      • Architecture
      • Configuration
      • MinIO S3
      • Apache Geode
    • Connectors
      • Sources
        • Kafka
          • Ingestion
        • RabbitMQ
          • Further RabbitMQ configurations
        • MQTT
          • Topic wildcards
          • Session management
          • Last Will and Testament
        • Rest endpoints
        • MinIO S3
        • File watcher
      • Sinks
        • Kafka
        • RabbitMQ
          • Further configurations
        • MQTT
          • Persistent messaging
          • Last Will and Testament
        • SQL databases
        • InfluxDB
        • MongoDB
        • Geode
        • WebSocket endpoint
        • MinIO S3
        • File transport
        • Slack
        • Email
      • Serialisers
        • Serialisation
          • Custom transform example
          • Formatters
        • Deserialisers
          • Custom parsing example
    • Observability
      • Enabling JMX for Joule
      • Meters
      • Metrics API
  • DEVELOPER GUIDES
    • Setting up developer environment
      • Environment setup
      • Build and deploy
      • Install Joule
        • Install Docker demo environment
        • Install with Docker
        • Install from source
        • Install Joule examples
    • Joulectl CLI
    • API Endpoints
      • Mangement API
        • Use case
        • Pipelines
        • Data connectors
        • Contextual data
      • Data access API
        • Query
        • Upload
        • WebSocket
      • SQL support
    • Builder SDK
      • Connector API
        • Sources
          • StreamEventParser API
        • Sinks
          • CustomTransformer API
      • Processor API
      • Analytics API
        • Create custom metrics
        • Define analytics
        • Windows API
        • SQL queries
      • Transformation API
        • Obfuscation API
        • FieldTokenizer API
      • File processing
      • Data types
        • StreamEvent
        • ReferenceDataObject
        • GeoNode
    • System configuration
      • System properties
  • Deployment strategies
    • Deployment Overview
    • Single Node
    • Cluster
    • GuardianDB
    • Packaging
      • Containers
      • Bare metal
  • Product updates
    • Public Roadmap
    • Release Notes
      • v1.2.0 Join Streams with stateful analytics
      • v1.1.0 Streaming analytics enhancements
      • v1.0.4 Predictive stream processing
      • v1.0.3 Contextual SQL based metrics
    • Change history
Powered by GitBook
On this page
  • What we will learn on this article?
  • Processing types
  • Batch processing
  • Stream processing
  • How is this applied in Joule?
  • Data ingestion
  • Stream processors
  • Data delivery
  • Unified processing
  • Extensibility

Was this helpful?

  1. Concepts

Batch and stream processing

Unified platform for batch and stream processing

PreviousUnified execution engineNextContinuous metrics

Last updated 4 months ago

Was this helpful?

What we will learn on this article?

This article introduces the foundational concepts of data processing within the Joule platform, highlighting its seamless integration of batch and stream processing into a unified system.

By the end, we will gain a clear understanding of Joule's processing capabilities and how they connect to its broader functionality.

Joule does not distinguish between stream and batch and applies the same processing techniques to both

Processing types

Batch processing

Batch processing requires the data to be bounded in segments. It involves handling data in chunks or sets. It is ideal for scenarios like periodic reporting, where large datasets are processed at once.

Joule adopts a micro-batching method, treating batch data as a stream of events internally. This unified approach allows batch jobs to operate seamlessly alongside real-time streams.

To ensure efficient large-scale data handling, Joule leverages for memory optimisation and fast file handling. With unified execution, batch jobs can also include real-time triggers, enabling dynamic, mixed-mode operations.

Stream processing

Stream processing operates on unbounded datasets. It handles data in real-time, event by event.

Joule’s core strength lies in its ability to process data streams dynamically, enabling tasks like transformations, enrichment and predictive analytics with low latency.

Offering near-real-time analytics as data continuously arrives. Joule has built a custom analytic processing engine and integrates for high-performance internal data storage. This provides an efficient high throughput streaming analytics.

This approach allows data to flow through pipelines without delays caused by waiting for complete datasets, making it ideal for low-latency use cases like predictive analytics or real-time dashboards.

How is this applied in Joule?

Data ingestion

Joule connects seamlessly to a variety of event sources, enabling continuous or periodic data ingestion.

These sources include:

  • Standard Kafka implementing for partitioned data streaming

  • RabbitMQ for mixed modes of streaming architectures

  • MQTT for lightweight messaging

  • Minio S3 for cloud-based storage

  • File Watcher for monitoring file changes

  • and lightweight systems like REST APIs

This flexibility ensures that Joule can integrate with diverse systems to gather the necessary input for executing pipelines.

Stream processors

At the heart of Joule’s functionality are its stream processors, which perform distinct tasks such as:

  1. Data enrichment.

  2. Transformations.

  3. Real-time predictions.

  4. Event window analytics.

These processors can be chained together into modular pipelines, allowing businesses to design workflows tailored to specific needs.

For example, processors can normalise incoming data, aggregate trends over time, or generate predictive insights, enabling the creation of scalable and flexible event-driven use cases.

Data delivery

Once data is processed, Joule integrates with downstream systems using its flexible data sinks.

These include SQL databases for structured storage, InfluxDB for time-series analytics, Kafka for redistributing processed streams, WebSocket systems for real-time dashboards and file outputs for exporting data in custom formats.

This ensures that the processed data is delivered to the right systems to provide maximum business value.

Unified processing

Joule’s unified engine enables seamless integration of batch and stream processing within a single platform. Mixed-mode pipelines allow businesses to process historical data and live streams simultaneously.

For instance, a batch job could generate periodic reports from historical datasets while triggering real-time alerts based on live data. This combination enhances operational flexibility, making Joule suitable for a wide range of applications, from real-time analytics to long-term trend reporting.

Extensibility


By combining advanced processing techniques, extensibility and unified execution, Joule offers a comprehensive solution for managing complex data workflows, empowering businesses to handle everything from real-time event streams to large-scale batch processing with ease.

Joule’s allows developers to build custom processors, extending its capabilities to meet unique business requirements.

Apache Arrow
DuckDB
Sources
Processors
Sinks
Unified execution engine
Processor SDK