v1.1.0 Streaming analytics enhancements

Build streaming analytics pipelines powered by machine learning, metrics and reference data using live data

Version 1.1.0

Overview

Joule’s latest release offers businesses a comprehensive solution to accelerate use case development to generate value while minimising risk. The platform leverages dynamic ML models, metrics, reference data, and observability to provide real-time actions and insights.

With Joule, businesses can streamline their development efforts and make informed decisions based on data-driven insights. Joule’s intuitive development platform and user-focused design make it easy for businesses to leverage the power of data and maximise their potential.

Features

Predictive Processor

JPMML model initialisation using local file and remote S3 stores
Dynamic model refresh using model update notifications
Offline prediction auditing that enables explainability, drift monitoring and model retraining

Avro support

Ability to process avro records for inbound and outbound events
Complex data types supported using custom mapping
Schema registry support

Minio S3 Transport

OOTB multi cloud S3 support
Publish and consume events and insights to/from hybrid hosted S3 buckets
Drive pipeline processing using S3 bucket notifications
Consumer supports following file formats: PARQUET, CSV, ARROW, ORC
Keep reference data up to date using external systems

Reference Data

Apply external data within stream processing tasks
In-memory reference data elements kept up-to-date using source change notifications
Support for key value and S3 stores
Reference data file loader utility

Rest Consumer APIs

File consuming endpoint that enable ease of integration to upstream systems
Joule event consumer endpoint to provide the ability to chain Joule processors within a cloud environment

File Watcher Consumer

File watcher that consumes and processes target files
Supported formats; Parquet, Json, CSV, ORC and Arrow IPC

Enhancements

Kafka

Confluent schema registry support for outbound events
Message partition support
Confluent and RedPanda support

Enricher processor

Query optimisation
SQL, OQL, and Key value enrichment support

Transports

Improved exception handling to fail on startup
Strict ordering

Apache Arrow

Integrated and leveraged to process file efficiently and of various file formats
Large file processing support

Optimisations

Processing optimisations that reduce both memory and CPU utilisation while increasing event throughput.
StreamEvent smart shallow cloning logic to reduce overall memory footprint while providing key data isolation
StreamEvent change tracking switch to reduce memory overhead

Upgrades

Javalin 5.6.3
Kafka 3.6.0
Avro 1.11.3
DuckDB 0.9.2

Bug Fixes

StreamEventCSVDeserializer

Fixed fields from holding only string values to correctly defined data types
Allowed for custom date format to be provided

StreamEventJSONDeserializer

Can now read an array of Json StreamEvent objects

JVM Configuration Additions

Require ‘--add-opens=java.base/java.nio=ALL-UNNAMED’ to be added to the java CLI due to Apache Arrow requirements
Applying the G1 GC regionalized and generational garbage collector to improved memory usage

Previousv1.2.0 Join Streams with stateful analytics Nextv1.0.4 Predictive stream processing

Last updated 11 months ago

Was this helpful?