Formatters
Apply standard data format process to outgoing data
Overview
Formatters are mainly used for direct storage whereby data tools such as PySpark, Apache Presto, DuckDB, MongoDB, Postgres, MySQL etc,. can use the data directly without further overhead.
This is useful for stream processing systems where data from different sources or formats needs to be transformed into a consistent event format for downstream processing.
JSON
Standard JSON formatter converts processed events to a JSON string using specified attributes
Example
Attributes schema
date format
Date format to apply to date fields
String
Default: yyyy/MM/dd
indent output
Apply indentation formatting
Boolean
Default: false
contentType
Type of content to inform receiving application
String
Default: application/json
encoding
Payload encoding method
String
Default: UTF-8
ext
File extension
String
Default: json
CSV
Standard CSV formatter converts processed events to a CSV string using specified attributes
Example
Attributes schema
date format
Date format to apply to date fields
String
Default: yyyy/MM/dd
delimiter
Field delimiter
Character
Default: ","
contentType
Type of content to inform receiving application
String
Default: text/csv
encoding
Payload encoding method
String
Default: UTF-8
ext
File extension
String
Default: csv
Parquet
Converts a StreamEvent
to an AVRO object using a target schema format before writing to a parquet formatted object.
Example
Attributes schema
schema path
Path location for the Avro output schema
String
compression codec
Algorithm to use to compress file. Available types:
UNCOMPRESSED
SNAPPY
GZIP
LZO
BROTLI
LZ4
ZSTD
String
Default: UNCOMPRESSED
contentType
Type of content to inform receiving application
String
Default: binary/octet-stream
encoding
Payload encoding method
String
Default: UTF_8
ext
File extension
String
Default: parquet
temp file directory
Directory path for temp files
String
Default: ./tmp
Last updated