Serialisation
Data emitted to downstream systems is serialised using provided or custom serialisers
Overview
Joule provides event formatters and serializers to transform StreamEvent
objects into structured formats for stream processing, visualization or analytics.
These tools enable seamless integration with diverse systems by converting raw data to formats like JSON, CSV or Parquet, tailored to downstream needs.
For advanced use cases, the CustomTransformer API allows developers to implement custom transformations, mapping StreamEvent
objects to domain-specific types or unique formats not supported by standard serialisers. This flexibility ensures Joule can meet specialised data integration requirements.
Each connector type will have its preferred supported serialisation method
This page serves as a guide for configuring data serialisation and formatting, with examples and options for effective data integration in Joule.
This page includes the following:
Serialisation DSL element Describes serialisation element, which defines how
StreamEvents
are serialised for downstream systems, using formatters like JSON, CSV and Parquet.Kafka example Provides an example Kafka publisher configuration that serialises
StreamEvents
as JSON events.Custom transformer API Covers the
CustomTransformer
interface for custom serialisation when unique data transformations are required.Avro serialiser Introduce how AVRO can be used to transform Joule events to target domain events using a provided AVRO IDL schema file.
Formatters Outlines available formatters (JSON, CSV, Parquet) with example configurations for data storage needs.
Serialisation DSL element
The serializer
DSL element defines how StreamEvents
should be serialised for downstream consumption, specifying attributes such as data format and target configuration.
For example, when using Kafka to publish data, you can use a JSON formatter to structure StreamEvents
into JSON objects, enabling efficient data exchange across multiple processes in a system topology.
Example
This code snippet shows a basic setup for a Kafka publisher within Joule, where StreamEvent
objects are converted to JSON and published to a target topic.
This setup enables Joule inter-process communication in a larger use case topology, where Joule processes communicate by publishing and subscribing to events.
Attributes schema
Attribute | Description | Type |
---|---|---|
transformer | Convert | |
formatter | Convert a StreamEvent to a target format such as JSON, CSV, Object | Default: JSON |
compress | Compress resulting serialise data for efficient network transfer | Boolean Default: False |
batch | Batch events to process in to a single payload. | Boolean Default: False |
properties | Specific properties required for custom serialisation process | Properties |
Custom transformer API
Convert StreamEvent
to a target domain type using a custom transformer developed using the Developer JDK.
Two methods are provided:
CustomTransformer API For code based transformations.
AVRO serialisation For automated event translation.
Object CustomTransformer API
This capability is used when the use case requires a specific domain data type for downstream consumption. Developers are expected to provide domain specific implementations by implementing the CustomTransformer
interface.
Refer to the custom transform example documentation for detailed instructions on implementing a custom transformer.
Example
The following example converts the StreamEvent
to a StockAnalyticRecord
using custom code.
This data object is then converted in to a JSON object using the Kafka serialisation framework.
AVRO serialisation
This is an AVRO transform implementation utilising the CustomTransformer API. To simplify usage, an avro serializer
DSL element is provided, along with configurable attributes.
The transformer automatically maps StreamEvent
attributes on to the the target data domain type attributes using a provided AVRO schema IDL. Currently only local schema files are supported with schema registry support on request.
Integrate Joule to any existing system using already established data structures
Example
Attributes schema
Attribute | Description | Data Type | Required |
---|---|---|---|
schema | Path and name of schema file | String | |
field mapping | Custom mapping of source | Map<String,String> |
Data formatters
Formatters are mainly used for direct storage whereby data tools such as PySpark, Apache Presto, DuckDB, MongoDB, Postgres, MySQL etc,. can use the data directly without further data transformation overhead.
Available Implementations
See data formatters documentation for configuration options.
JSON formatter
CSV formatter
Parquet formatter
Last updated