File watcher
Process large files using stream processing
Overview
Joule offers file watcher that processes large files using stream processing.
The file watcher excels in efficiently processing large files once they have been fully received. Files are processed once a new file event has been detected on the watch directory.
After processing, these files are then moved to the local processed directory, distinguished by a completion timestamp.
Examples & DSL attributes
This example configures a file watcher named nasdaq_quotes_file
to monitor the nasdaq/downloads
directory for new PARQUET
files (e.g., nasdaq.parquet
).
After processing the file, it moves it to the nasdaq/processed
directory and publishes the data to the quotes
topic.
Attributes schema
Attribute | Description | Data Type | Required |
---|---|---|---|
topic | User defined topic to be used as the final endpoint component | String | |
file name | Name of file to process | String | |
file format | Expected file format to process. Defined as a enumeration, see below for supported file types | Enum Default: PARQUET | |
watch dir | User defined directory for files received | String | |
processed dir | Location where processed files are place upon completion | String Default: processed |
Supported File Types
PARQUET
ARROW_IPC
ORC
CSV
JSON
Last updated