Parquet and CSV data imports for stream processing
Overview
The page describes how data imports from Parquet and CSV files can be configured, how to define tables and indexes and how these imports can be utilised within a processing pipeline.
The provided example demonstrates the integration of imported data into a stream-based analytics system, with calculations and data enrichment performed on the incoming data.
Parquet Import
Parquet formatted files can be imported into the system.
index cannot be created over a view
Example
This example imports Parquet files into different schemas and tables.
Some tables are views: us_holidays, others are regular tables: fxrates, bid_moving_averages.
Indexes are created on specific fields but are not unique and existing tables are dropped before importing new data.
Data can be imported from CSV files using a supported set of delimiters. The key difference between parquet and CSV, it is possible to control the table definition.
Joule by default will try to create a target table based upon a sample set of data assuming a header exists on the first row.
Example
This example imports a CSV file: fxrates.csv into the nasdaq table in the reference_data schema. It specifies the table structure, including column types.
Key settings include a custom delimiter (;), date and timestamp formats, sample size and no automatic detection of data types.
The table is dropped before importing and an index is created on the symbol field.