Joule
  • Welcome to Joule's Docs
  • Why Joule?
    • Joule capabilities
  • What is Joule?
    • Key features
    • The tech stack
  • Use case enablement
    • Use case building framework
  • Concepts
    • Core concepts
    • Low code development
    • Unified execution engine
    • Batch and stream processing
    • Continuous metrics
    • Key Joule data types
      • StreamEvent object
      • Contextual data
      • GeoNode
  • Tutorials
    • Getting started
    • Build your first use case
    • Stream sliding window quote analytics
    • Advanced tutorials
      • Custom missing value processor
      • Stateless Bollinger band analytics
      • IoT device control
  • FAQ
  • Glossary
  • Components
    • Pipelines
      • Use case anatomy
      • Data priming
        • Types of import
      • Processing unit
      • Group by
      • Emit computed events
      • Telemetry auditing
    • Processors
      • Common attributes
      • Filters
        • By type
        • By expression
        • Send on delta
        • Remove attributes
        • Drop all events
      • Enrichment
        • Key concepts
          • Anatomy of enrichment DSL
          • Banking example
        • Metrics
        • Dynamic contextual data
          • Caching architecture
        • Static contextual data
      • Transformation
        • Field Tokeniser
        • Obfuscation
          • Encryption
          • Masking
          • Bucketing
          • Redaction
      • Triggers
        • Change Data Capture
        • Business rules
      • Stream join
        • Inner stream joins
        • Outer stream joins
        • Join attributes & policy
      • Event tap
        • Anatomy of a Tap
        • SQL Queries
    • Analytics
      • Analytic tools
        • User defined analytics
          • Streaming analytics example
          • User defined analytics
          • User defined scripts
          • User defined functions
            • Average function library
        • Window analytics
          • Tumbling window
          • Sliding window
          • Aggregate functions
        • Analytic functions
          • Stateful
            • Exponential moving average
            • Rolling Sum
          • Stateless
            • Normalisation
              • Absolute max
              • Min max
              • Standardisation
              • Mean
              • Log
              • Z-Score
            • Scaling
              • Unit scale
              • Robust Scale
            • Statistics
              • Statistic summaries
              • Weighted moving average
              • Simple moving average
              • Count
            • General
              • Euclidean
        • Advanced analytics
          • Geospatial
            • Entity geo tracker
            • Geofence occupancy trigger
            • Geo search
            • IP address resolver
            • Reverse geocoding
            • Spatial Index
          • HyperLogLog
          • Distinct counter
      • ML inferencing
        • Feature engineering
          • Scripting
          • Scaling
          • Transform
        • Online predictive analytics
        • Model audit
        • Model management
      • Metrics engine
        • Create metrics
        • Apply metrics
        • Manage metrics
        • Priming metrics
    • Contextual data
      • Architecture
      • Configuration
      • MinIO S3
      • Apache Geode
    • Connectors
      • Sources
        • Kafka
          • Ingestion
        • RabbitMQ
          • Further RabbitMQ configurations
        • MQTT
          • Topic wildcards
          • Session management
          • Last Will and Testament
        • Rest endpoints
        • MinIO S3
        • File watcher
      • Sinks
        • Kafka
        • RabbitMQ
          • Further configurations
        • MQTT
          • Persistent messaging
          • Last Will and Testament
        • SQL databases
        • InfluxDB
        • MongoDB
        • Geode
        • WebSocket endpoint
        • MinIO S3
        • File transport
        • Slack
        • Email
      • Serialisers
        • Serialisation
          • Custom transform example
          • Formatters
        • Deserialisers
          • Custom parsing example
    • Observability
      • Enabling JMX for Joule
      • Meters
      • Metrics API
  • DEVELOPER GUIDES
    • Setting up developer environment
      • Environment setup
      • Build and deploy
      • Install Joule
        • Install Docker demo environment
        • Install with Docker
        • Install from source
        • Install Joule examples
    • Joulectl CLI
    • API Endpoints
      • Mangement API
        • Use case
        • Pipelines
        • Data connectors
        • Contextual data
      • Data access API
        • Query
        • Upload
        • WebSocket
      • SQL support
    • Builder SDK
      • Connector API
        • Sources
          • StreamEventParser API
        • Sinks
          • CustomTransformer API
      • Processor API
      • Analytics API
        • Create custom metrics
        • Define analytics
        • Windows API
        • SQL queries
      • Transformation API
        • Obfuscation API
        • FieldTokenizer API
      • File processing
      • Data types
        • StreamEvent
        • ReferenceDataObject
        • GeoNode
    • System configuration
      • System properties
  • Deployment strategies
    • Deployment Overview
    • Single Node
    • Cluster
    • GuardianDB
    • Packaging
      • Containers
      • Bare metal
  • Product updates
    • Public Roadmap
    • Release Notes
      • v1.2.0 Join Streams with stateful analytics
      • v1.1.0 Streaming analytics enhancements
      • v1.0.4 Predictive stream processing
      • v1.0.3 Contextual SQL based metrics
    • Change history
Powered by GitBook
On this page
  • Objective
  • Example & DSL attributes
  • Top level attributes
  • Attributes schema
  • Supported feature engineering
  • As value
  • Expression based
  • Example
  • Custom Plugins
  • Example
  • Available options
  • Versioning

Was this helpful?

  1. Components
  2. Analytics
  3. ML inferencing

Feature engineering

Decorate a feature vector with enriched features specific to the deployed model

“Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering.” — Prof. Andrew Ng.

Objective

Joule provides a feature engineering processor that enables users to define how features are to be created ready for predictive analytics use cases.

The processor generates for each declared feature field an engineered value. Two methods are supported:

  • raw

  • compute values using custom expression and plugins

On completion a feature map is generated with all the required features and placed in the StreamEvent ready for the next processor in the pipeline.

Example & DSL attributes

feature engineering:
  name: retailProfilingFeatures
  versioned: true
  features:
    as values:
      - location_code
      - store_id

    compute:
      spend_ratio:
        scripting:
          macro:
            expression: 1 - spend/avg_spend
            language: js
            variables:
              avg_spend: 133.78
      age:
        function:
          age binning:
            source field: date_of_birth
      day:
        function:
          day-of-week transform:
            source field: date

Top level attributes

Attribute
Description
Data Type
Required

name

Name feature set which is used for a predicting model

String

versioned

A boolean flag to apply a unique version identifier to the resulting feature map

Boolean

Default: true

features

List of supported feature functions

List

The features attribute provide two key elements, as value and compute. Either one of the attributes must be defined.

feature engineering:
  name: retailProfilingFeatures
  versioned: true
  features:
    as values:
      - location_code
      - store_id

Attributes schema

Attribute
Description
Data Type
Required

as values

List of event fields whose value will be copied in to the feature map without any changes

List

compute

List of supported feature functions mapped to output variables to be executed using the passed event

List

feature engineering:
  ...
  features:
    as values:
      - event field1
      - event field2
      
    compute:
      output_field:
        scripting:
          ...
      other_output_field:
        function:
          plugin_name:
            ... < plugin setting > ...
            event fields:
              - f
            variables:
              varname: value

Supported feature engineering

As value

This is most basic function whereby the StreamEvent field value is copied in to the feature map.

Example

The following example will copy the location_code and store_id values directly in to the feature map.

feature engineering:
  ...
  features:
    as values:
      - location_code
      - store_id

Expression based

Example

The following example computes per event, the spend ration based utilising a Javascript expression.

feature engineering:
  ...
  features:
    compute:
      spend_ratio:
        scripting:
          macro:
            expression: 1 - spend/avg_spend
            variables:
              avg_spend: 133.78

Custom Plugins

Developers can extend the feature engineering capabilities by extending the AbstractFeatureEngineeringFunction interface.

Example

The following example computes per event, the scale price based utilising the MinMax algorithm.

This example implements the AbstractFeatureEngineeringFunction class.

feature engineering:
  ...
  features:
    compute:
      scaled_price:
        function:
          minmax scaler:
            source field: price
            variables:
              min: 10.00
              max: 12.78

Available options

Joule provides a small set of OOTB feature engineering functions.

Versioning

Every feature map created is versioned using a random UUID.

The version is place directly in to the resulting map and accessed using the feature_version key.

PreviousML inferencingNextScripting

Last updated 6 months ago

Was this helpful?

Joule core provides the ability to deploy declarative expressions using the . This has been reused within the context of feature engineering to enable users to define custom calculations within the DSL.

See for further details.

custom analytics processor
CustomUserPlugin API documentation

Scripting

Define custom analytics with declarative expressions

Scaling

Normalise data with various scaling methods

Transform

Generate analytics-ready features from data