MinIO S3

MinIO file producer using S3 cloud or local hosted buckets

Overview

S3 support is provided using the MinIO Publisher Transport. Processed event data is saved as files in the MinIO, a high-performance, S3-compatible object storage system.

This setup supports cloud or local MinIO storage and uses predefined file formats for each bucket. Configurable options include:

  1. Custom schemas

  2. Batch size

  3. Object naming formats

  4. Retries

making Joule's application with MinIO suitable for scalable data storage solutions.

Driver details: io.minio:minio:8.5.4

Examples & DSL attributes

The example configures a MinIOPublisher to save event data in a local S3-compatible bucket named marketdata.

Files are stored under the stocks object using a schema defined in marketDataSchema.avsc and are organised by date format. The file format is provided in AVRO.

minioPublisher:
  name: "marketdata-S3Publisher"

  connection:
    endpoint: "https://localhost"
    port: 9000
    credentials:
      access key: "XXXXXX"
      secret key: "YYYYYYYYYYYYYYY"

  bucket:
    bucketId: "marketdata"
    object name: "stocks"
    date format: "yyyyMMdd/HH"
    versioning: ENABLED
    retries: 3

  serializer:
    formatter:
      parquet formatter:
        schema: "./avro/marketDataSchema.avsc"
        temp directory: "./tmp"

  batchSize: 500000  

Attributes schema

Attribute
Description
Data Type
Required

name

Name of source stream

String

connection

Connection details

bucket

S3 bucket to ingest object data from

serializer

Serialisation configuration

batchSize

Number of eventsents to be processed and written to a single file.

Long Default: 1024

Connection attributes schema

Attribute
Description
Data Type
Required

endpoint

S3 service endpoint

String Default: https://localhost

port

Port the S3 service is hosted on

Integer Default:9000

url

Provide a fully qualified url endpoint, i.e. AWS, GCP, Azure urls. This is used over the endpoint setting if provided

URL String

region

Region where the bucket is to be accessed

String

tls

Use a TLS connection

Boolean Default: false

credentials

IAM access credentials

Credentials attributes schema

For non-production use cases the access/secret keys can be used to prove data ingestion functionality. When migrating to a production environment implement a provider plugin using the provided JouleProviderPlugin interface, see basic example below.

Attribute
Description
Data Type
Required

access key

IAM user access key

String

secret key

IAM user secret key

String

provider plugin

Custom implementation of credentials ideal for production level deployments

JouleProviderPlugin implementation

JouleProviderPlugin interface

The JWTCredentialsProvider implements the JouleProviderPlugin interface, providing methods for initialisation, validation and setting properties, but with no functionality implemented in this case.

It's a template for custom credential providers.

public class JWTCredentialsProvider implements JouleProviderPlugin {
    @Override
    public Provider getProvider() {
        return null;
    }

    @Override
    public void initialize() throws CustomPluginException {
    }

    @Override
    public void validate() throws InvalidSpecificationException {
    }

    @Override
    public void setProperties(Properties properties) {
    }
}

Bucket attributes schema

Attribute
Description
Data Type
Required

bucketId

Bucket name

String

object name

Object name to listen for events too

String

versioning

Ability to use object versioning. Valid values are either ENABLED or SUSPENDED

ENUM

bucket policy

Policy file location to be applied

String

partition by date

Write files using date partitioning

Boolean Default: true

date format

Directory date format to apply when partitioning by date

String Default: yyyyMMdd

custom directory

Custom directory to be applied after the date path. Useful when writing multiple objects to to same bucket and date partition but want an independent directory

String

headers

Object header information

Map<String,String>

user metadata

Applied user metadata to object

Map<String,String>

Additional resources

Last updated