File watcher

Process large files using stream processing

Overview

Joule offers file watcher that processes large files using stream processing.

The file watcher excels in efficiently processing large files once they have been fully received. Files are processed once a new file event has been detected on the watch directory.

After processing, these files are then moved to the local processed directory, distinguished by a completion timestamp.

Examples & DSL attributes

This example configures a file watcher named nasdaq_quotes_file to monitor the nasdaq/downloads directory for new PARQUET files (e.g., nasdaq.parquet).

After processing the file, it moves it to the nasdaq/processed directory and publishes the data to the quotes topic.

file watcher:
  name: nasdaq_quotes_file
  topic: quotes
  file name: nasdaq.parqet
  file format: PARQUET
  watch dir: nasdaq/dowloads
  processed dir: nasdaq/processed

Attributes schema

Attribute

Description

Data Type

Required

topic

User defined topic to be used as the final endpoint component

String

file name

Name of file to process

String

file format

Expected file format to process. Defined as a enumeration, see below for supported file types

Enum Default: PARQUET

watch dir

User defined directory for files received

String

processed dir

Location where processed files are place upon completion

String Default: processed

Supported File Types

PARQUET
ARROW_IPC
ORC
CSV
JSON

PreviousMinIO S3 NextSinks

Last updated 9 months ago

Was this helpful?