# MinIO S3

## Overview

**S3 support** is provided using the **MinIO Publisher Transport**. Processed event data is saved as files in the MinIO, a high-performance, S3-compatible object storage system.

This setup supports cloud or local MinIO storage and uses predefined file formats for each bucket. Configurable options include:

1. Custom schemas
2. Batch size
3. Object naming formats
4. Retries

making Joule's application with MinIO suitable for scalable data storage solutions.

{% hint style="info" %}
**Driver details:** [io.minio:minio:8.5.4](https://mvnrepository.com/artifact/io.minio/minio/8.5.4)
{% endhint %}

## Examples & DSL attributes

The example configures a `MinIOPublisher` to save event data in a local S3-compatible bucket named `marketdata`.

Files are stored under the `stocks` object using a schema defined in `marketDataSchema.avsc` and are organised by date format. The file format is provided in AVRO.

```yaml
minioPublisher:
  name: "marketdata-S3Publisher"

  connection:
    endpoint: "https://localhost"
    port: 9000
    credentials:
      access key: "XXXXXX"
      secret key: "YYYYYYYYYYYYYYY"

  bucket:
    bucketId: "marketdata"
    object name: "stocks"
    date format: "yyyyMMdd/HH"
    versioning: ENABLED
    retries: 3

  serializer:
    formatter:
      parquet formatter:
        schema: "./avro/marketDataSchema.avsc"
        temp directory: "./tmp"

  batchSize: 500000  
```

### Attributes schema

<table><thead><tr><th width="171">Attribute</th><th width="280">Description</th><th width="188">Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>name</td><td>Name of source stream </td><td>String</td><td>true</td></tr><tr><td>connection </td><td>Connection details</td><td>See <a href="#connection-attributes">Connection attributes</a> section</td><td>true</td></tr><tr><td>bucket</td><td>S3 bucket to ingest object data from</td><td>See <a href="#bucket-attributes">Bucket attributes</a> section</td><td>true</td></tr><tr><td>serializer</td><td>Serialisation configuration</td><td>See <a href="/pages/miyhMElxCDdwYCoBYMz6">Serialisation Attributes</a> section</td><td>true</td></tr><tr><td>batchSize</td><td>Number of <code>eventsents</code> to be processed and written to a single file.</td><td>Long<br>Default:  1024</td><td>false</td></tr></tbody></table>

### **Connection a**ttributes schema

<table><thead><tr><th width="175">Attribute</th><th width="280">Description</th><th width="190">Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>endpoint</td><td>S3 service endpoint</td><td>String<br>Default: https://localhost</td><td>false</td></tr><tr><td>port</td><td>Port the S3 service is hosted on</td><td>Integer<br>Default:9000</td><td>false</td></tr><tr><td>url</td><td>Provide a fully qualified url endpoint, i.e. AWS, GCP, Azure urls. This is used over the endpoint setting if provided</td><td>URL String</td><td>false</td></tr><tr><td>region</td><td>Region where the bucket is to be accessed</td><td>String</td><td>false</td></tr><tr><td>tls</td><td>Use a TLS connection</td><td>Boolean<br>Default: false</td><td>false</td></tr><tr><td>credentials</td><td>IAM access credentials</td><td>See <a href="#credentials-attributes">Credentials section</a></td><td>false</td></tr></tbody></table>

### Credentials attributes schema

For non-production use cases the access/secret keys can be used to prove data ingestion functionality. When migrating to a production environment implement a provider plugin using the provided `JouleProviderPlugin` interface, see basic example below.

<table><thead><tr><th width="178">Attribute</th><th width="281">Description</th><th width="191">Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>access key</td><td>IAM user access key</td><td>String</td><td>false</td></tr><tr><td>secret key</td><td>IAM user secret key</td><td>String</td><td>false</td></tr><tr><td>provider plugin</td><td>Custom implementation of credentials ideal for production level deployments</td><td>JouleProviderPlugin implementation</td><td>false</td></tr></tbody></table>

### JouleProviderPlugin interface

The `JWTCredentialsProvider` implements the `JouleProviderPlugin` interface, providing methods for initialisation, validation and setting properties, but with no functionality implemented in this case.

It's a template for custom credential providers.

```java
public class JWTCredentialsProvider implements JouleProviderPlugin {
    @Override
    public Provider getProvider() {
        return null;
    }

    @Override
    public void initialize() throws CustomPluginException {
    }

    @Override
    public void validate() throws InvalidSpecificationException {
    }

    @Override
    public void setProperties(Properties properties) {
    }
}
```

### Bucket attributes schema

<table><thead><tr><th width="200">Attribute</th><th width="306">Description</th><th width="142">Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>bucketId</td><td>Bucket name</td><td>String</td><td>true</td></tr><tr><td>object name</td><td>Object name to listen for events too</td><td>String</td><td>true</td></tr><tr><td>versioning</td><td>Ability to use object versioning. Valid values are either ENABLED or SUSPENDED</td><td>ENUM</td><td>false</td></tr><tr><td>bucket policy</td><td>Policy file location to be applied</td><td>String</td><td>false</td></tr><tr><td>partition by date</td><td>Write files using date partitioning</td><td>Boolean<br>Default: true</td><td>false</td></tr><tr><td>date format</td><td>Directory date format to apply when partitioning by date</td><td>String<br>Default: yyyyMMdd</td><td>false</td></tr><tr><td>custom directory</td><td>Custom directory to be applied after the date path. Useful when writing multiple objects to to same bucket and date partition but want an independent directory</td><td>String</td><td>false</td></tr><tr><td>headers</td><td>Object header information</td><td>Map&#x3C;String,String></td><td>false</td></tr><tr><td>user metadata</td><td>Applied user metadata to object</td><td>Map&#x3C;String,String></td><td>false</td></tr></tbody></table>

## Additional resources

* Official [MinIO documentation](https://min.io/docs/minio/kubernetes/upstream/)
* MinIO [Docker image](https://hub.docker.com/r/minio/minio)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.fractalworks.io/joule/components/connectors/sinks/minio-s3.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
