Joule
  • Welcome to Joule's Docs
  • Why Joule?
    • Joule capabilities
  • What is Joule?
    • Key features
    • The tech stack
  • Use case enablement
    • Use case building framework
  • Concepts
    • Core concepts
    • Low code development
    • Unified execution engine
    • Batch and stream processing
    • Continuous metrics
    • Key Joule data types
      • StreamEvent object
      • Contextual data
      • GeoNode
  • Tutorials
    • Getting started
    • Build your first use case
    • Stream sliding window quote analytics
    • Advanced tutorials
      • Custom missing value processor
      • Stateless Bollinger band analytics
      • IoT device control
  • FAQ
  • Glossary
  • Components
    • Pipelines
      • Use case anatomy
      • Data priming
        • Types of import
      • Processing unit
      • Group by
      • Emit computed events
      • Telemetry auditing
    • Processors
      • Common attributes
      • Filters
        • By type
        • By expression
        • Send on delta
        • Remove attributes
        • Drop all events
      • Enrichment
        • Key concepts
          • Anatomy of enrichment DSL
          • Banking example
        • Metrics
        • Dynamic contextual data
          • Caching architecture
        • Static contextual data
      • Transformation
        • Field Tokeniser
        • Obfuscation
          • Encryption
          • Masking
          • Bucketing
          • Redaction
      • Triggers
        • Change Data Capture
        • Business rules
      • Stream join
        • Inner stream joins
        • Outer stream joins
        • Join attributes & policy
      • Event tap
        • Anatomy of a Tap
        • SQL Queries
    • Analytics
      • Analytic tools
        • User defined analytics
          • Streaming analytics example
          • User defined analytics
          • User defined scripts
          • User defined functions
            • Average function library
        • Window analytics
          • Tumbling window
          • Sliding window
          • Aggregate functions
        • Analytic functions
          • Stateful
            • Exponential moving average
            • Rolling Sum
          • Stateless
            • Normalisation
              • Absolute max
              • Min max
              • Standardisation
              • Mean
              • Log
              • Z-Score
            • Scaling
              • Unit scale
              • Robust Scale
            • Statistics
              • Statistic summaries
              • Weighted moving average
              • Simple moving average
              • Count
            • General
              • Euclidean
        • Advanced analytics
          • Geospatial
            • Entity geo tracker
            • Geofence occupancy trigger
            • Geo search
            • IP address resolver
            • Reverse geocoding
            • Spatial Index
          • HyperLogLog
          • Distinct counter
      • ML inferencing
        • Feature engineering
          • Scripting
          • Scaling
          • Transform
        • Online predictive analytics
        • Model audit
        • Model management
      • Metrics engine
        • Create metrics
        • Apply metrics
        • Manage metrics
        • Priming metrics
    • Contextual data
      • Architecture
      • Configuration
      • MinIO S3
      • Apache Geode
    • Connectors
      • Sources
        • Kafka
          • Ingestion
        • RabbitMQ
          • Further RabbitMQ configurations
        • MQTT
          • Topic wildcards
          • Session management
          • Last Will and Testament
        • Rest endpoints
        • MinIO S3
        • File watcher
      • Sinks
        • Kafka
        • RabbitMQ
          • Further configurations
        • MQTT
          • Persistent messaging
          • Last Will and Testament
        • SQL databases
        • InfluxDB
        • MongoDB
        • Geode
        • WebSocket endpoint
        • MinIO S3
        • File transport
        • Slack
        • Email
      • Serialisers
        • Serialisation
          • Custom transform example
          • Formatters
        • Deserialisers
          • Custom parsing example
    • Observability
      • Enabling JMX for Joule
      • Meters
      • Metrics API
  • DEVELOPER GUIDES
    • Setting up developer environment
      • Environment setup
      • Build and deploy
      • Install Joule
        • Install Docker demo environment
        • Install with Docker
        • Install from source
        • Install Joule examples
    • Joulectl CLI
    • API Endpoints
      • Mangement API
        • Use case
        • Pipelines
        • Data connectors
        • Contextual data
      • Data access API
        • Query
        • Upload
        • WebSocket
      • SQL support
    • Builder SDK
      • Connector API
        • Sources
          • StreamEventParser API
        • Sinks
          • CustomTransformer API
      • Processor API
      • Analytics API
        • Create custom metrics
        • Define analytics
        • Windows API
        • SQL queries
      • Transformation API
        • Obfuscation API
        • FieldTokenizer API
      • File processing
      • Data types
        • StreamEvent
        • ReferenceDataObject
        • GeoNode
    • System configuration
      • System properties
  • Deployment strategies
    • Deployment Overview
    • Single Node
    • Cluster
    • GuardianDB
    • Packaging
      • Containers
      • Bare metal
  • Product updates
    • Public Roadmap
    • Release Notes
      • v1.2.0 Join Streams with stateful analytics
      • v1.1.0 Streaming analytics enhancements
      • v1.0.4 Predictive stream processing
      • v1.0.3 Contextual SQL based metrics
    • Change history
Powered by GitBook
On this page
  • Overview
  • Key features of data pipelines
  • Example
  • Types of elements for the data pipelines

Was this helpful?

  1. Components

Pipelines

Data (stream) pipelines enable business-specific, real-time analytics use cases

PreviousGlossaryNextUse case anatomy

Last updated 6 months ago

Was this helpful?

Visit the to understand what use cases are in Joule.

Overview

At the core of the platform is the use case, defined by users who understand the business's specific computation and needs. This page should give an overview of how a use case is defined within the Joule platform.

It will describes a sample use case for setting up a real-time data stream that monitors and analyses stock quote data, using features like window aggregation, metrics calculations, filtering and event grouping.

Key features of data pipelines

The following described features equip users to build, manage and refine robust data-driven pipelines tailored to their business objectives.

  1. Data priming Joule allows the system to be primed at initialisation with static contextual data and pre-calculated metrics, ensuring that initial data needs are met before processing begins. This setup is essential for accurate data handling and pre-population of relevant tables.

  2. Processing unit The platform enables rapid and straightforward creation of use cases, providing tools for customisable computation and processing logic. This flexibility allows users to define and execute business-specific workflows quickly.

  3. Emit Joule supports tailored output selection, allowing users to specify the exact fields to be included in the final output. Additionally, it provides options for final filtering, streamlining the data sent to downstream systems.

  4. Group by Grouping similar events enables more efficient data aggregation and reduces output size, helping optimise processing and storage requirements for downstream systems.

  5. Telemetry auditing With inbound and outbound event auditing, Joule supports rigorous testing, model validation and retraining processes. This enhances reliability and ongoing refinement of data models.

Example

In this use case, we are setting up a real-time data stream pipeline called tumblingWindowQuoteStream to monitor and analyse stock quote data from a live source. This setup provides real-time analytics on stock quotes, capturing trends and key statistics for each stock symbol over defined time intervals.

The use case makes use of the and before publishing a filter stream of events to a connected publisher.

stream:
  name: tumblingWindowQuoteStream
  eventTimeType: EVENT_TIME
  
  initialisation:
    sql import:
      schema: fxrates
      parquet:
        - table: quote
          asView: false
          files: [ 'fxrates.parquet' ]
          drop table: true
          index:
            fields: [ 'ccy' ]
            unique: false
            
  processing unit:
    metrics engine:
        policy:
          timeUnit: MINUTES
          frequency: 1
          startup delay: 2
  
        foreach metric compute:
          metrics:
            - name: BidMovingAverage
              metric key: symbol
              table definition: standardQuoteAnalyticsStream.BidMovingAverage (symbol VARCHAR, avg_bid_min FLOAT, avg_bid_avg FLOAT,avg_bid_max FLOAT)                          
              query:
                SELECT symbol, 
                AVG(bid_MAX) AS mt_avg_bid_min,
                AVG(bid_MAX) AS mt_avg_bid_max
                OVER (
                PARTITION BY symbol ORDER BY eventTime ASC
                RANGE BETWEEN INTERVAL 1 SECONDS PRECEDING
                AND INTERVAL 1 SECONDS FOLLOWING)
                AS 'bid max Moving Average'
                FROM quotesStream.tumblingQuoteAnalytics_View
                ORDER BY 1;
              truncate on start: true
              compaction:
                frequency: 8
                timeUnit: HOURS
              
    pipeline:
      - filter:
          expression: "symbol != 'A'"
      - timeWindow:
          emittingEventType: tumblingQuoteAnalytics
          aggregateFunctions:
            MIN: [ask, bid]
            MAX: [ask, bid]
          policy:
            type: tumblingTime
            windowSize: 5000

  emit:
    eventType: windowQuoteEvent
    select: "symbol, ask_MIN, ask_MAX, bid_MIN, bid_MAX, BidMovingAverage.avg_bid_max;WHERE symbol=${symbol}"
    having: "symbol !='A'"

  group by:
    - symbol

Types of elements for the data pipelines

This example will be split up in section in each

use case concept article
metrics engine
tumbling window

Data priming

Loads Joule with necessary startup data

Procession unit

Quickly builds custom business use cases

Emit

Selects and filters output for publishing

Group by

Aggregates events to streamline downstream data

Telemetry auditing

Tracks events for validation and testing

element of the data pipeline