Predictive stream processing

A solid foundation for streaming ML predictions platform


Version 1.0.4

Overview

This release brings a number of new features, bug fixes, optimisations and general usability enhancements. The focus of this release has been providing a solid foundation for streaming ML predictions.

Features

  • Feature Engineering

  • JPMML Machine Learning

  • Processing auditing

  • SQL Query API

  • Web socket publisher

  • Project Templates


Feature Engineering

Joule provides a feature engineering processor that enables users to define how features are to be created ready for predictive analytics use cases.

The processor generates for each declared feature field an engineered value. Two methods are supported; raw and compute values using custom expression and plugins. On completion a feature map is generated will all the required features and placed in the StreamEvent ready for the next processor in the pipeline.

To get you started OOTB plugins are provided for the following functional categories:

  • Scripting

  • Scaling

  • Transform

Example

feature engineering:
  name: retailProfilingFeatures
  versioned: true
  features:
    as values:
      - location_code
      - store_id

    compute:
      spend_ratio:
        scripting:
          macro:
            expression: 1 - spend/avg_spend
            language: js
            variables:
              avg_spend: 133.78
      age:
        function:
          age binning:
            source field: date_of_birth
      day:
        function:
          day-of-week transform:
            source field: date

Machine Learning

Joule provides a PMML predictor processor to perform streaming predictions / scoring. The implementation leverages the JPMML open source library developed by Villu Ruusmann.

Example

pmml predictor:
  name: irisScorer
  model filename: /hom/joule/models/pmml/iris_rf.pmml
  response field: flowerPrediction
  audit configuration:
    target schema: ml_audit
    queue capacity: 5000
    flush frequency: 5 

Auditing

An optional configuration provide the ability to audit predications to enable model retraining, feature and prediction drift management, model observability, and any local business governance requirements.

The configuration will dynamically create an in-memory database table, using the process name as the target table, and rest endpoints to enable direct access and export functions.

With these new features Joule can now provide stream based advanced analytics using PMML models within Docker containers. An example is illustrated below.

SQL Query API

Joule embeds DuckDB, an in-memory database, in to the runtime process. The solution is ideal for supporting custom processor logic using various methods such as:

  • Hosting and accessing custom reference data

  • Scratchpad for stateful processing

  • Ad-hoc custom complex queries

  • Capture and exporting streaming events

See documentation for further details.

Web Socket Support

Joule now supports publishing events on to a Web Socket publisher. Events are serialised as Json.

websocketPublisher:
  pathOverride: /joule/websocket/stream

See documentation for further details.

Project Templates

To kist start transport and processor custom development a projects template project is provided. The project can be found on this link.

Last updated