Serialisation

Data emitted to downstream systems is serialised using provided or custom serialisers

Overview

Joule provides event formatters and serializers to transform StreamEvent objects into structured formats for stream processing, visualization or analytics.

These tools enable seamless integration with diverse systems by converting raw data to formats like JSON, CSV or Parquet, tailored to downstream needs.

For advanced use cases, the CustomTransformer API allows developers to implement custom transformations, mapping StreamEvent objects to domain-specific types or unique formats not supported by standard serialisers. This flexibility ensures Joule can meet specialised data integration requirements.

Each connector type will have its preferred supported serialisation method

This page serves as a guide for configuring data serialisation and formatting, with examples and options for effective data integration in Joule.

This page includes the following:

  1. Serialisation DSL element Describes serialisation element, which defines how StreamEvents are serialised for downstream systems, using formatters like JSON, CSV and Parquet.

  2. Kafka example Provides an example Kafka publisher configuration that serialises StreamEvents as JSON events.

  3. Custom transformer API Covers the CustomTransformer interface for custom serialisation when unique data transformations are required.

  4. Avro serialiser Introduce how AVRO can be used to transform Joule events to target domain events using a provided AVRO IDL schema file.

  5. Formatters Outlines available formatters (JSON, CSV, Parquet) with example configurations for data storage needs.

Serialisation DSL element

The serializer DSL element defines how StreamEvents should be serialised for downstream consumption, specifying attributes such as data format and target configuration.

For example, when using Kafka to publish data, you can use a JSON formatter to structure StreamEvents into JSON objects, enabling efficient data exchange across multiple processes in a system topology.

Example

This code snippet shows a basic setup for a Kafka publisher within Joule, where StreamEvent objects are converted to JSON and published to a target topic.

This setup enables Joule inter-process communication in a larger use case topology, where Joule processes communicate by publishing and subscribing to events.

kafkaPublisher:
  ...
  serializer:
    formatter:
      json formatter: {}

Attributes schema

AttributeDescriptionType

transformer

Convert StreamEvent to a target domain type using a custom transformer developed using the Developer JDK

formatter

Convert a StreamEvent to a target format such as JSON, CSV, Object

Formatter

Default: JSON

compress

Compress resulting serialise data for efficient network transfer

Boolean

Default: False

batch

Batch events to process in to a single payload.

Boolean

Default: False

properties

Specific properties required for custom serialisation process

Properties

Custom transformer API

Convert StreamEvent to a target domain type using a custom transformer developed using the Developer JDK.

Two methods are provided:

  1. CustomTransformer API For code based transformations.

  2. AVRO serialisation For automated event translation.

Object CustomTransformer API

This capability is used when the use case requires a specific domain data type for downstream consumption. Developers are expected to provide domain specific implementations by implementing the CustomTransformer interface.

Refer to the custom transform example documentation for detailed instructions on implementing a custom transformer.

Example

The following example converts the StreamEvent to a StockAnalyticRecord using custom code.

This data object is then converted in to a JSON object using the Kafka serialisation framework.

kafkaPublisher:
  ...
  serializer:
    transform: 
      com.fractalworks.examples.banking.data.StockAnalyticRecordTransform
      
    key serializer: 
      org.apache.kafka.common.serialization.IntegerSerializer
    value serializer: 
      com.fractalworks.streams.transport.kafka.serializers.json.ObjectJsonSerializer

AVRO serialisation

This is an AVRO transform implementation utilising the CustomTransformer API. To simplify usage, an avro serializer DSL element is provided, along with configurable attributes.

The transformer automatically maps StreamEvent attributes on to the the target data domain type attributes using a provided AVRO schema IDL. Currently only local schema files are supported with schema registry support on request.

Integrate Joule to any existing system using already established data structures

Example

kafkaPublisher:
  ...
  serializer:
    transform:
      avro serializer:
        schema: /home/myapp/schema/customer.avsc
    ...

Attributes schema

AttributeDescriptionData TypeRequired

schema

Path and name of schema file

String

field mapping

Custom mapping of source StreamEvent fields to target domain fields

Map<String,String>

Data formatters

Formatters are mainly used for direct storage whereby data tools such as PySpark, Apache Presto, DuckDB, MongoDB, Postgres, MySQL etc,. can use the data directly without further data transformation overhead.

Available Implementations

See data formatters documentation for configuration options.

JSON formatter

CSV formatter

Parquet formatter

Last updated