# Serialisation

## Overview

Joule provides event formatters and serializers to transform `StreamEvent` objects into **structured formats** for stream processing, visualization or analytics.

These tools enable seamless **integration with diverse systems** by converting raw data to formats like JSON, CSV or Parquet, tailored to downstream needs.

For advanced use cases, the [CustomTransformer API](#custom-transformer-api) allows developers to implement custom transformations, mapping `StreamEvent` objects to **domain-specific types or unique formats** not supported by standard serialisers. This flexibility ensures Joule can meet **specialised data integration requirements**.

{% hint style="info" %}
Each connector type will have its preferred supported serialisation method
{% endhint %}

This page serves as a guide for configuring data serialisation and formatting, with examples and options for effective data integration in Joule.

This page includes the following:

1. <mark style="color:green;">**Serialisation DSL element**</mark>\
   Describes  serialisation element, which defines how `StreamEvents` are serialised for downstream systems, using formatters like JSON, CSV and Parquet.
2. <mark style="color:green;">**Kafka example**</mark>\
   Provides an example Kafka publisher configuration that serialises `StreamEvents` as JSON events.
3. <mark style="color:green;">**Custom transformer API**</mark>\
   Covers the `CustomTransformer` interface for custom serialisation when unique data transformations are required.
4. <mark style="color:green;">**Avro serialiser**</mark>\
   Introduce how [AVRO](https://avro.apache.org/) can be used to transform Joule events to target domain events using a provided AVRO IDL schema file.
5. <mark style="color:green;">**Formatters**</mark>\
   Outlines available formatters (JSON, CSV, Parquet) with example configurations for data storage needs.

## Serialisation DSL element

The `serializer`  DSL element defines how `StreamEvents` should be serialised for downstream consumption, specifying attributes such as data format and target configuration.

For example, when using Kafka to publish data, you can use a JSON formatter to structure `StreamEvents` into JSON objects, enabling **efficient data exchange** across multiple processes in a system topology.

### Example

This code snippet shows a basic setup for a Kafka publisher within Joule, where `StreamEvent` objects are converted to JSON and published to a target topic.

This setup enables Joule **inter-process communication** in a larger use case topology, where Joule processes communicate by **publishing and subscribing to events**.

```yaml
kafkaPublisher:
  ...
  serializer:
    formatter:
      json formatter: {}
```

### Attributes schema

<table><thead><tr><th width="165.33333333333331">Attribute</th><th width="359">Description</th><th>Type</th></tr></thead><tbody><tr><td>transformer</td><td>Convert <code>StreamEvent</code> to a target domain type using a custom transformer developed using the Developer JDK</td><td><a href="../../../developer-guides/builder-sdk/connector-api/sinks/customtransformer-api">CustomTransformer API</a></td></tr><tr><td>formatter</td><td>Convert a StreamEvent to a target format such as JSON, CSV, Object</td><td><p><a href="serialisation/formatters">Formatter</a></p><p>Default: JSON</p></td></tr><tr><td>compress</td><td>Compress resulting serialise data for efficient network transfer </td><td><p>Boolean</p><p>Default: False</p></td></tr><tr><td>batch</td><td>Batch events to process in to a single payload.</td><td><p>Boolean</p><p>Default: False</p></td></tr><tr><td>properties</td><td>Specific properties required for custom serialisation process</td><td>Properties</td></tr></tbody></table>

## Custom transformer API

Convert `StreamEvent` to a target domain type using a custom transformer developed using the Developer JDK.

Two methods are provided:

1. <mark style="color:green;">**CustomTransformer API**</mark>\
   **F**or code based transformations.
2. <mark style="color:green;">**AVRO serialisation**</mark>\
   **F**or automated event translation.

### Object CustomTransformer API

This capability is used when the use case **requires a specific domain data type** for downstream consumption. Developers are expected to provide domain specific implementations by implementing the `CustomTransformer` interface.

Refer to the [custom transform example documentation](https://docs.fractalworks.io/joule/components/connectors/serialisers/serialisation/custom-transform-example) for detailed instructions on implementing a custom transformer.

#### Example

The following example converts the `StreamEvent` to a `StockAnalyticRecord` using custom code.

This data object is then converted in to a JSON object using the Kafka serialisation framework.

```yaml
kafkaPublisher:
  ...
  serializer:
    transform: 
      com.fractalworks.examples.banking.data.StockAnalyticRecordTransform
      
    key serializer: 
      org.apache.kafka.common.serialization.IntegerSerializer
    value serializer: 
      com.fractalworks.streams.transport.kafka.serializers.json.ObjectJsonSerializer
```

### AVRO serialisation

This is an AVRO transform implementation utilising the CustomTransformer API. To simplify usage, an `avro serializer` DSL element is provided, along with configurable attributes.

The transformer **automatically maps** `StreamEvent` attributes on to the the target data domain type attributes using a provided AVRO schema IDL. Currently only local schema files are supported with schema registry support on request.

{% hint style="success" %}
Integrate Joule to any existing system using already established data structures
{% endhint %}

#### Example

```yaml
kafkaPublisher:
  ...
  serializer:
    transform:
      avro serializer:
        schema: /home/myapp/schema/customer.avsc
    ...
```

#### Attributes schema

<table><thead><tr><th width="200">Attribute</th><th width="269">Description</th><th width="182">Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>schema</td><td>Path and name of schema file</td><td>String</td><td>true</td></tr><tr><td>field mapping</td><td>Custom mapping of source <code>StreamEvent</code> fields to target domain fields</td><td>Map&#x3C;String,String></td><td>false</td></tr></tbody></table>

## Data formatters

Formatters are mainly used for direct storage whereby data tools such as PySpark, Apache Presto, DuckDB, MongoDB, Postgres, MySQL etc,. can use the data directly without further data transformation overhead.

### Available Implementations

See [data formatters documentation](https://docs.fractalworks.io/joule/components/connectors/serialisers/serialisation/formatters) for configuration options.

<table data-view="cards"><thead><tr><th></th><th></th><th></th></tr></thead><tbody><tr><td></td><td><mark style="color:orange;"><strong>JSON formatter</strong></mark></td><td></td></tr><tr><td></td><td><mark style="color:orange;"><strong>CSV formatter</strong></mark></td><td></td></tr><tr><td></td><td><mark style="color:orange;"><strong>Parquet formatter</strong></mark></td><td></td></tr></tbody></table>
