> For the complete documentation index, see [llms.txt](https://docs.fractalworks.io/joule/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.fractalworks.io/joule/concepts/batch-and-stream-processing.md).

# Batch and stream processing

## What we will learn on this article?

This article introduces the foundational concepts of data processing within the Joule platform, highlighting its seamless integration of batch and stream processing into a unified system.

By the end, we will gain a clear understanding of Joule's processing capabilities and how they connect to its broader functionality.

{% hint style="info" %}
Joule does not distinguish between stream and batch and applies the same processing techniques to both
{% endhint %}

## Processing types

### **Batch processing**

Batch processing requires the data to be bounded in segments. It involves **handling data in chunks** or sets. It is ideal for scenarios like periodic reporting, where large datasets are processed at once.

Joule adopts a **micro-batching method**, treating batch data as a stream of events internally. This unified approach allows batch jobs to operate seamlessly alongside real-time streams.

To ensure **efficient large-scale data handling**, Joule leverages [Apache Arrow](https://arrow.apache.org/) for memory optimisation and fast file handling. With unified execution, batch jobs can also include real-time triggers, enabling dynamic, mixed-mode operations.

### **Stream processing**

Stream processing operates on unbounded datasets. It handles data in real-time, event by event.

Joule’s core strength lies in its ability to **process data streams dynamically**, enabling tasks like transformations, enrichment and predictive analytics with low latency.

Offering near-real-time analytics as data continuously arrives. Joule has built a custom analytic processing engine and integrates [DuckDB](https://duckdb.org/) for high-performance internal data storage. This provides an efficient high throughput streaming analytics.

This approach allows data to flow through pipelines without delays caused by waiting for complete datasets, making it **ideal for low-latency use cases** like predictive analytics or real-time dashboards.

## How is this applied in Joule?

### **Data ingestion**

Joule connects seamlessly to a variety of event sources, enabling continuous or periodic data ingestion.

These sources include:

* Standard Kafka implementing for partitioned data streaming
* RabbitMQ for mixed modes of streaming architectures
* MQTT for lightweight messaging
* Minio S3 for cloud-based storage
* File Watcher for monitoring file changes
* and lightweight systems like REST APIs

This flexibility ensures that Joule can integrate with diverse systems to gather the necessary input for executing pipelines.

{% content-ref url="/pages/VV1P7H5l7TZRXP9RnIGe" %}
[Sources](/joule/components/connectors/sources.md)
{% endcontent-ref %}

### **Stream processors**

At the heart of Joule’s functionality are its stream processors, which perform distinct tasks such as:

1. Data enrichment.
2. Transformations.
3. Real-time predictions.
4. Event window analytics.

These processors can be chained together into modular pipelines, allowing businesses to design workflows tailored to specific needs.

For example, processors can normalise incoming data, aggregate trends over time, or generate predictive insights, enabling the creation of scalable and flexible event-driven use cases.

{% content-ref url="/pages/21H6be81tCxZDwGJDYxq" %}
[Processors](/joule/components/processors.md)
{% endcontent-ref %}

### **Data delivery**

Once data is processed, Joule integrates with downstream systems using its flexible data sinks.

These include SQL databases for structured storage, InfluxDB for time-series analytics, Kafka for redistributing processed streams, WebSocket systems for real-time dashboards and file outputs for exporting data in custom formats.

This ensures that the processed data is delivered to the right systems to provide maximum business value.

{% content-ref url="/pages/1WswysHh2afn2BGijRG0" %}
[Sinks](/joule/components/connectors/sinks.md)
{% endcontent-ref %}

### **Unified processing**

Joule’s unified engine enables seamless integration of batch and stream processing within a single platform. Mixed-mode pipelines allow businesses to process historical data and live streams simultaneously.

For instance, a batch job could generate periodic reports from historical datasets while triggering real-time alerts based on live data. This combination enhances operational flexibility, making Joule suitable for a wide range of applications, from real-time analytics to long-term trend reporting.

{% content-ref url="/pages/CRnyeys5Fho0lA2OdNFo" %}
[Unified execution engine](/joule/concepts/unified-execution-engine.md)
{% endcontent-ref %}

### **Extensibility**

Joule’s [Processor SDK](/joule/developer-guides/builder-sdk.md) allows developers to build custom processors, extending its capabilities to meet unique business requirements.

***

By combining advanced processing techniques, extensibility and unified execution, Joule offers a comprehensive solution for managing complex data workflows, empowering businesses to handle everything from real-time event streams to large-scale batch processing with ease.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.fractalworks.io/joule/concepts/batch-and-stream-processing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
