Processing unit

The processing unit is the key function Joule provides to the user.

Overview

At its simplest form, a processing unit provides two key functions; a stream processing pipeline and a metrics engine that generates metrics based upon a time schedule policy.

processing unit:              
    pipeline:
      // List of processors

    metrics engine:
      // Metrics declarations            

Pipeline

A pipeline is a sequence of processors that compute functions on events. Joule provides out-of-the-box a set of core processors to enable you to build useful use cases, see processor documentation to learn more about the available processors.

Example

This example filters events that match the symbol 'A' and then places the event into a tumbling window. on every 5 seconds, the two aggregation functions execute against the stored events which are grouped by symbol, see the use cases overview for the complete definition. This pipeline will generate one event per group by criteria i.e. symbol.

pipeline:
  - filter:
      expression: "symbol != 'A'"
  - time window:
      emitting type: tumblingQuoteAnalytics
      aggregate functions:
        MIN: [ask, bid]
        MAX: [ask, bid]
      policy:
        type: tumblingTime
        window size: 5000

The pipeline is defined as a list of processors whereby each processor defines its own DSL syntax.

Metrics Engine

The metrics engine provides the ability to compute complex metrics based upon SQL queries, the embedded SQL engine provides this feature which is enabled by default.

Example

metrics engine:
    policy:
      timeUnit: MINUTES
      frequency: 1
      startup delay: 2

    foreach metric compute:
      metrics:
        - 
          name: BidMovingAverage
          metric key: symbol
          table definition: standardQuoteAnalyticsStream.BidMovingAverage (symbol VARCHAR, avg_bid_min FLOAT, avg_bid_avg FLOAT,avg_bid_max FLOAT)                          
          query:
            SELECT symbol, 
            AVG(bid_MAX) AS mt_avg_bid_min,
            AVG(bid_MAX) AS mt_avg_bid_max
            OVER (
            PARTITION BY symbol ORDER BY eventTime ASC
            RANGE BETWEEN INTERVAL 1 SECONDS PRECEDING
            AND INTERVAL 1 SECONDS FOLLOWING)
            AS 'bid max Moving Average'
            FROM quotesStream.tumblingQuoteAnalytics_View
            ORDER BY 1;
          truncate on start: true
          compaction:
            frequency: 8
            timeUnit: HOURS

policy

Metrics are generated by applying a time schedule policy. The three below DSL elements provide the required setting.

Required

timeUnit

Time unit used for the set frequency and startup delay elements.

Supported timeUnits: SECONDS, MINUTES, HOURS

Default: MINUTES

frequency

The time period between calculations.

Default: 1 Minute

startup delay

Time to wait before the first metric calculation are perform.

Default: 5 Minutes

Metrics Computations

The foreach metric compute syntax defines a metric table, computation, management and assigns it to a named metric family.

name

A unique name for the metric. This is otherwise know as the metrics family name.

Required

metric key

This is required to dynamically generate optimised metric queries for user lookups and management functions.

Required

table definition

A SQL table needs to be defined for metrics to be stored and accessed. The definition required the schema to be added to the table name.

Required

query

The computation is an ANSI SQL query which is executed periodically with the results inserted into the defined metric table.

Required

Management

Metrics are managed on startup and during processing to support general good housekeeping and memory management. Default management processes are enabled on an hourly basis.

truncate on start

Truncates table on startup if set to true.

Optional: Default true

compaction

Compaction is used to remove old metrics over the period set.

Optional: Default hourly management

frequency

Optional: Default hourly

timeUnits

Time units applied to frequency setting.

Support Type: HOURS, MINUTES

Optional: Default HOURS

Last updated