# Formatters

## Overview

Formatters are mainly used for direct storage whereby data tools such as PySpark, Apache Presto, DuckDB, MongoDB, Postgres, MySQL etc,. can use the data directly without further overhead.

This is useful for stream processing systems where **data from different sources or formats** needs to be **transformed into a consistent event** format for downstream processing.

## JSON <a href="#json-formatter" id="json-formatter"></a>

Standard JSON formatter converts processed events to a JSON string using specified attributes

### Example

```yaml
file:
  ...
  json formatter:
    date format: YYYY/MM/dd
    contentType: application/json
    indent output: false
```

### Attributes schema

<table><thead><tr><th width="200">Attribute</th><th width="220">Description</th><th width="222">Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>date format</td><td>Date format to apply to date fields</td><td><p>String</p><p>Default: yyyy/MM/dd</p></td><td>false</td></tr><tr><td>indent output</td><td>Apply indentation formatting</td><td><p>Boolean</p><p>Default: false</p></td><td>false</td></tr><tr><td>contentType</td><td>Type of content to inform receiving application</td><td><p>String </p><p>Default: application/json</p></td><td>false</td></tr><tr><td>encoding</td><td>Payload encoding method</td><td><p>String</p><p>Default: UTF-8</p></td><td>false</td></tr><tr><td>ext</td><td>File extension</td><td><p>String</p><p>Default: json</p></td><td>false</td></tr></tbody></table>

## CSV

Standard CSV formatter converts processed events to a CSV string using specified attributes

### Example

```yaml
file:
  ...
  csv formatter:
    contentType: text/csv
    encoding: UTF_8
    delimiter: "|"
```

### Attributes schema

<table><thead><tr><th width="200">Attribute</th><th width="220">Description</th><th width="222">Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>date format</td><td>Date format to apply to date fields</td><td><p>String</p><p>Default: yyyy/MM/dd</p></td><td>false</td></tr><tr><td>delimiter</td><td>Field delimiter</td><td><p>Character</p><p>Default: ","</p></td><td>false</td></tr><tr><td>contentType</td><td>Type of content to inform receiving application</td><td><p>String </p><p>Default: text/csv</p></td><td>false</td></tr><tr><td>encoding</td><td>Payload encoding method</td><td><p>String</p><p>Default: UTF-8</p></td><td>false</td></tr><tr><td>ext</td><td>File extension</td><td><p>String</p><p>Default: csv</p></td><td>false</td></tr></tbody></table>

## Parquet

Converts a `StreamEvent` to an AVRO object using a target schema format before writing to a parquet formatted object.

### Example

```yaml
file:
  ...
  parquet formatter:
    schema path: /home/joule/outputschema.avro
    compression codec: SNAPPY
    temp filedir: /tmp
    contentType: binary/octet-stream
    encoding: UTF_8
```

### Attributes schema

<table><thead><tr><th width="200">Attribute</th><th width="220">Description</th><th width="222">Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>schema path</td><td>Path location for the Avro output schema </td><td>String</td><td>true</td></tr><tr><td>compression codec</td><td><p>Algorithm to use to compress file. Available types:</p><ul><li>UNCOMPRESSED</li><li>SNAPPY</li><li>GZIP</li><li>LZO</li><li>BROTLI</li><li>LZ4</li><li>ZSTD</li></ul></td><td><p>String</p><p>Default: UNCOMPRESSED</p></td><td>true</td></tr><tr><td>contentType</td><td>Type of content to inform receiving application</td><td><p>String </p><p>Default: binary/octet-stream</p></td><td>false</td></tr><tr><td>encoding</td><td>Payload encoding method</td><td><p>String</p><p>Default: UTF_8</p></td><td>false</td></tr><tr><td>ext</td><td>File extension</td><td><p>String</p><p>Default: parquet</p></td><td>false</td></tr><tr><td>temp file directory</td><td>Directory path for temp files</td><td><p>String</p><p>Default: ./tmp</p></td><td>false</td></tr></tbody></table>
