Streaming analytics enhancements

Build streaming analytics pipelines powered by machine learning, metrics and reference data using live data

Version 1.1.0

Overview

Joule’s latest release offers businesses a comprehensive solution to accelerate use case development to generate value while minimising risk. The platform leverages dynamic ML models, metrics, reference data, and observability to provide real-time actions and insights.

With Joule, businesses can streamline their development efforts and make informed decisions based on data-driven insights. Joule’s intuitive development platform and user-focused design make it easy for businesses to leverage the power of data and maximise their potential.

Features

Predictive Processor

  • JPMML model initialisation using local file and remote S3 stores

  • Dynamic model refresh using model update notifications

  • Offline prediction auditing that enables explainability, drift monitoring and model retraining

Avro support

  • Ability to process avro records for inbound and outbound events

  • Complex data types supported using custom mapping

  • Schema registry support

Minio S3 Transport

  • OOTB multi cloud S3 support

  • Publish and consume events and insights to/from hybrid hosted S3 buckets

  • Drive pipeline processing using S3 bucket notifications

  • Consumer supports following file formats: PARQUET, CSV, ARROW, ORC

  • Keep reference data up to date using external systems

Reference Data

  • Apply external data within stream processing tasks

  • In-memory reference data elements kept up-to-date using source change notifications

  • Support for key value and S3 stores

  • Reference data file loader utility

Rest Consumer APIs

  • File consuming endpoint that enable ease of integration to upstream systems

  • Joule event consumer endpoint to provide the ability to chain Joule processors within a cloud environment

File Watcher Consumer

  • File watcher that consumes and processes target files

  • Supported formats; Parquet, Json, CSV, ORC and Arrow IPC,

Enhancements

Kafka

  • Confluent schema registry support for outbound events

  • Message partition support

  • Confluent and RedPanda support

Enricher processor

  • Query optimisation

  • SQL, OQL, and Key value enrichment support

Transports

  • Improved exception handling to fail on startup

  • Strict ordering

Apache Arrow

  • Integrated and leveraged to process file efficiently and of various file formats

  • Large file processing support

Optimisations

  • Processing optimisations that reduce both memory and CPU utilisation while increasing event throughput.

  • StreamEvent smart shallow cloning logic to reduce overall memory footprint while providing key data isolation

  • StreamEvent change tracking switch to reduce memory overhead

Upgrades

  • Javalin 5.6.3

  • Kafka 3.6.0

  • Avro 1.11.3

  • DuckDB 0.9.2

Bug Fixes

StreamEventCSVDeserializer

  • Fixed fields from holding only string values to correctly defined data types

  • Allowed for custom date format to be provided

StreamEventJSONDeserializer

  • Can now read an array of Json StreamEvent objects

JVM Configuration Additions

  • Require ‘--add-opens=java.base/java.nio=ALL-UNNAMED’ to be added to the java CLI due to Apache Arrow requirements

  • Applying the G1 GC regionalized and generational garbage collector to improved memory usage

Last updated