# Data priming

{% hint style="info" %}
This is an optional feature that provides the ability to **prime Joule** with data necessary for an active use case
{% endhint %}

## Overview

Advanced use cases often require contextual data to support calculations or complex business logic. Joule enables this by offering data priming at **initialisation** and **enrichment processing** stages.

The initialisation process imports data at startup from **local files into an in-memory SQL database**, making it immediately available for use in processing.

{% hint style="info" %}
To see what formats can be imported, see the [Types of imports](https://docs.fractalworks.io/joule/components/pipelines/data-priming/types-of-import) page.
{% endhint %}

## Initialisation process

Joule’s initialisation process leverages an embedded SQL engine, enabling powerful features like [metrics](https://docs.fractalworks.io/joule/components/processors/enrichment/metrics), event capturing, data exporting and access to [contextual](https://docs.fractalworks.io/joule/components/contextual-data) data.

This imported data, typically static contextual information, plays a vital role in supporting key functions within the event stream pipeline.

Data made available through the initialisation process can be accessed through several main components:

1. [<mark style="color:green;">**Enricher processor**</mark>](https://docs.fractalworks.io/joule/components/processors/enrichment)\
   For adding contextual information to events.
2. [<mark style="color:green;">**Metrics engine**</mark>](https://docs.fractalworks.io/joule/components/analytics/metrics-engine)\
   For real-time calculations and metrics updates.
3. <mark style="color:green;">**Select projection**</mark>\
   For choosing specific fields for further processing.
4. <mark style="color:green;">**In-memory SQL API**</mark>\
   For direct data access and manipulation within Joule.

<table><thead><tr><th width="121">Attribute</th><th width="356">Description</th><th width="179">Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>schema</td><td>Global database schema when set can be used for any import definition where schema is not defined. Default schema <code>reference_data</code></td><td>String</td><td>false</td></tr><tr><td>parquet</td><td>List of parquet data import configurations</td><td><a href="types-of-import#parquet-import">See parquet attributes</a></td><td>false</td></tr><tr><td>csv</td><td>List of CSV data import configurations</td><td><a href="types-of-import#csv-import">See CSV attributes</a></td><td>false</td></tr></tbody></table>

## Examples & DSL attributes

This following example demonstrates how to initialise two separate data files into independent in-memory SQL database tables using CSV and Parquet formats.

1. The CSV file contains Nasdaq company information, it is treated as **static reference data** and is therefore stored in the `reference_data` schema.
2. Meanwhile, the Parquet file loads pre-calculated metrics, priming the metrics engine within the `metrics` schema.

This setup enables efficient access to contextual data and metrics calculations during event processing.

{% hint style="success" %}
This feature can load and read files from existing databases!
{% endhint %}

```yaml
stream:
  ...
  initialisation:
    data import:
      csv:
        - schema: reference_data
          table: nasdaq_companies
          file: 'data/csv/nasdaq.csv'
          drop table: true
          index:
            fields: [ 'Symbol' ]
            unique: true
  
      parquet:
        - schema: metrics
          table: bid_moving_averages      
          files: ['data/parquet/mvavgs-prime.parquet']
          drop table: true
          index:
            fields: [ 'symbol' ]
            unique: false
```

### Attributes schema

These are common DSL keywords used in both parquet and CSV importing methods.

<table><thead><tr><th width="145">Attribute</th><th width="366">Description</th><th width="138">Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>schema</td><td>Database schema to create and apply table import function</td><td>String</td><td>false</td></tr><tr><td>table</td><td>Target table to import data into</td><td>String</td><td>true</td></tr><tr><td>drop table</td><td><p>Drop existing table before </p><p>import. This will cause a table recreation</p></td><td>Boolean<br><em>Default true</em></td><td>false</td></tr><tr><td>index</td><td>Create an index on the created table</td><td><a href="#index">See next</a></td><td>false</td></tr></tbody></table>

### Index&#x20;

If this optional field is supplied the index is recreated once the data has been imported.

<table><thead><tr><th width="176">Attribute</th><th width="279">Description</th><th>Data Type</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>fields</td><td>A list of table fields to base <code>index</code> on</td><td>String</td><td>true</td></tr><tr><td>unique</td><td><p>True for a unique <code>index</code></p><p></p></td><td><p>Boolean</p><p><em>Default true</em></p></td><td>false</td></tr></tbody></table>
