Joule
  • Welcome to Joule's Docs
  • Why Joule?
    • Joule capabilities
  • What is Joule?
    • Key features
    • The tech stack
  • Use case enablement
    • Use case building framework
  • Concepts
    • Core concepts
    • Low code development
    • Unified execution engine
    • Batch and stream processing
    • Continuous metrics
    • Key Joule data types
      • StreamEvent object
      • Contextual data
      • GeoNode
  • Tutorials
    • Getting started
    • Build your first use case
    • Stream sliding window quote analytics
    • Advanced tutorials
      • Custom missing value processor
      • Stateless Bollinger band analytics
      • IoT device control
  • FAQ
  • Glossary
  • Components
    • Pipelines
      • Use case anatomy
      • Data priming
        • Types of import
      • Processing unit
      • Group by
      • Emit computed events
      • Telemetry auditing
    • Processors
      • Common attributes
      • Filters
        • By type
        • By expression
        • Send on delta
        • Remove attributes
        • Drop all events
      • Enrichment
        • Key concepts
          • Anatomy of enrichment DSL
          • Banking example
        • Metrics
        • Dynamic contextual data
          • Caching architecture
        • Static contextual data
      • Transformation
        • Field Tokeniser
        • Obfuscation
          • Encryption
          • Masking
          • Bucketing
          • Redaction
      • Triggers
        • Change Data Capture
        • Business rules
      • Stream join
        • Inner stream joins
        • Outer stream joins
        • Join attributes & policy
      • Event tap
        • Anatomy of a Tap
        • SQL Queries
    • Analytics
      • Analytic tools
        • User defined analytics
          • Streaming analytics example
          • User defined analytics
          • User defined scripts
          • User defined functions
            • Average function library
        • Window analytics
          • Tumbling window
          • Sliding window
          • Aggregate functions
        • Analytic functions
          • Stateful
            • Exponential moving average
            • Rolling Sum
          • Stateless
            • Normalisation
              • Absolute max
              • Min max
              • Standardisation
              • Mean
              • Log
              • Z-Score
            • Scaling
              • Unit scale
              • Robust Scale
            • Statistics
              • Statistic summaries
              • Weighted moving average
              • Simple moving average
              • Count
            • General
              • Euclidean
        • Advanced analytics
          • Geospatial
            • Entity geo tracker
            • Geofence occupancy trigger
            • Geo search
            • IP address resolver
            • Reverse geocoding
            • Spatial Index
          • HyperLogLog
          • Distinct counter
      • ML inferencing
        • Feature engineering
          • Scripting
          • Scaling
          • Transform
        • Online predictive analytics
        • Model audit
        • Model management
      • Metrics engine
        • Create metrics
        • Apply metrics
        • Manage metrics
        • Priming metrics
    • Contextual data
      • Architecture
      • Configuration
      • MinIO S3
      • Apache Geode
    • Connectors
      • Sources
        • Kafka
          • Ingestion
        • RabbitMQ
          • Further RabbitMQ configurations
        • MQTT
          • Topic wildcards
          • Session management
          • Last Will and Testament
        • Rest endpoints
        • MinIO S3
        • File watcher
      • Sinks
        • Kafka
        • RabbitMQ
          • Further configurations
        • MQTT
          • Persistent messaging
          • Last Will and Testament
        • SQL databases
        • InfluxDB
        • MongoDB
        • Geode
        • WebSocket endpoint
        • MinIO S3
        • File transport
        • Slack
        • Email
      • Serialisers
        • Serialisation
          • Custom transform example
          • Formatters
        • Deserialisers
          • Custom parsing example
    • Observability
      • Enabling JMX for Joule
      • Meters
      • Metrics API
  • DEVELOPER GUIDES
    • Setting up developer environment
      • Environment setup
      • Build and deploy
      • Install Joule
        • Install Docker demo environment
        • Install with Docker
        • Install from source
        • Install Joule examples
    • Joulectl CLI
    • API Endpoints
      • Mangement API
        • Use case
        • Pipelines
        • Data connectors
        • Contextual data
      • Data access API
        • Query
        • Upload
        • WebSocket
      • SQL support
    • Builder SDK
      • Connector API
        • Sources
          • StreamEventParser API
        • Sinks
          • CustomTransformer API
      • Processor API
      • Analytics API
        • Create custom metrics
        • Define analytics
        • Windows API
        • SQL queries
      • Transformation API
        • Obfuscation API
        • FieldTokenizer API
      • File processing
      • Data types
        • StreamEvent
        • ReferenceDataObject
        • GeoNode
    • System configuration
      • System properties
  • Deployment strategies
    • Deployment Overview
    • Single Node
    • Cluster
    • GuardianDB
    • Packaging
      • Containers
      • Bare metal
  • Product updates
    • Public Roadmap
    • Release Notes
      • v1.2.0 Join Streams with stateful analytics
      • v1.1.0 Streaming analytics enhancements
      • v1.0.4 Predictive stream processing
      • v1.0.3 Contextual SQL based metrics
    • Change history
Powered by GitBook
On this page
  • Prerequisites
  • Step 1
  • Step 2
  • Running an example
  • 1. Start up the environment
  • 2. Deploy a use case
  • 3. Start the quote simulator
  • 4. View the results
  • Or from the Redpanda UI
  • 5. Stopping the processes
  • Optional Step - Setup the environment
  • Whats going on?
  • 1. Connect and subscribe to a streaming data source
  • Example Kafka subscription
  • 2. Process events
  • Example tumbling window calculations
  • 3. Distribute processed events
  • Example Kafka publish connection using a custom transformation
  • Deployment artefact
  • Binding file used to run a use case

Was this helpful?

  1. Getting started

Quickstart

Gain insights from your data sources with minimised friction using the Joule prototyping platform

Last updated 5 months ago

Was this helpful?


Download the getting started project . Clone the repository and follow the instructions to run the examples

When you start working with Joule you will be editing files locally using a code editor and running use case examples using . If you prefer to build your projects within an Integrated Development Environment (IDE) clone the existing banking example project from .

Prerequisites

Getting started with Joule has minimal requirements to getting started but to take full advantage of the platform some key technical capabilities would be needed.

Step 1

git clone https://gitlab.com/joule-platform/fractalworks-stream-gettingstarted.git

Step 2

Note: To configure and run Joule it is important that you know some basics of the Terminal. In particular, you should understand general bash commands to navigate through the directory structure of your computer easily.

Running an example

Joule has provided the necessary scripts and configurations on getting a use case running using either a Docker image or a local unpacked installation, we shall use the local installation to get you familiar with the general directory structure, this will benefit your understanding of the provided Docker image.

1. Start up the environment

For this you need to change in to the quickstart directory and run a single command.

cd quickstart
./startupJoule.sh

This will start up the following containers ready for use case deployment:

  • Joule Daemon

  • Joule Database

2. Deploy a use case

Both the banking and telco demo directories provide a set of examples in the form of a postman collection and environment. These can be found under the examples directory. Now lets get you started running the banking Getting Started demo example using Postman.

  1. First import the use case demos and environment files from the banking-demo examples directory

  2. Set the environment to Joule.

  3. From the Getting started \ Deploy folder click Run folder from the menu.

  4. Finally execute the run order by clicking the Run Joule - Banking demo button

This will deploy the source, stream, sink and the use case binding definition to the platform. Note on a restart these setting will be rehydrated and will automatically start.

3. Start the quote simulator

This will generate simulated quotes based upon the provided nasdaq csv info file.

quickstart % ./bin/startQuotePublisher.sh 
Starting Quotes Simulator v1.2.1-SNAPSHOT
ProcessId 52506
appending output to nohup.out

4. View the results

docker exec -it joule-gs-redpanda-0 rpk topic consume analytics_view --num 1
{
  "topic": "analytics_view",
  "key": "\u0000\u00015L",
  "value": "{\"symbol\":\"PHD\",\"time\":1706625828336,\"askFirst\":15.420006518870125,\"askLast\":15.420006518870125}",
  "timestamp": 1706625828093,
  "partition": 0,
  "offset": 0
}

Or from the Redpanda UI

5. Stopping the processes

quickstart % ./bin/stopQuotePublisher.sh 
Quote simulator closed

quickstart % shutdownJoule.sh 
Joule processed closed

Note there are many other examples within the getting started project. These are described within the README.md files.

Optional Step - Setup the environment

gradle buildJouleDemoEnv

This will build both the banking and telco example components, and copy across the configurations, libraries, and setup the demo environment.

Whats going on?

The banking getting started use case demonstrates core features that are reusable across all use cases; connecting to data, processing and distributing events.

Hence, we shall start with a simple use case that demonstrates how to get yourself running with the platform using the following repeatable steps. The use case subscribes to a Kafka quotes topic, get the high and low price per symbol using tumbling windows and then publish resulting event onto the analytics_view topic.

1. Connect and subscribe to a streaming data source

Overview of the definition

  • nasdaq_quotes_event_stream is the logical name for the source definition

  • Subscribe to events using the quotes Kafka topic

  • Received events are deserialised using a user defined parser in to a Joule StreamEvent to enable processing

Example Kafka subscription

kafkaConsumer:
  name: nasdaq_quotes_stream
  cluster address: joule-gs-redpanda-0:9092
  consumerGroupId: nasdaq
  topics:
  - quotes
  deserializer:
    parser: com.fractalworks.examples.banking.data.QuoteToStreamEventParser
    key deserializer: org.apache.kafka.common.serialization.IntegerDeserializer
    value deserializer: com.fractalworks.streams.transport.kafka.serializers.object.ObjectDeserializer
  properties:
    partition.assignment.strategy: org.apache.kafka.clients.consumer.StickyAssignor
    max.poll.records: '7000'
    fetch.max.bytes: '10485760'

2. Process events

Overview of the definition

  • basic_tumbling_window_pipeline is used as the logical name to defined streaming processing pipeline, this will be used in the next step

  • Processing constraints, valid date to and from, define when this stream can execute

  • Event processing will use the event time provide within the received event

  • The use case will subscribe to events from the nasdaq_quotes_stream data source configured in step 1.

  • The use case applies 1 second tumbling window aggregate functions for two event attributes grouped by symbol

  • A simple select projection emits the computed events

Example tumbling window calculations

stream:
  name: basic_tumbling_window_pipeline
  enabled: true
  eventTimeType: EVENT_TIME
  sources:
  - nasdaq_quotes_stream
  processing unit:
    pipeline:
    - time window:
        emitting type: tumblingQuoteAnalytics
        aggregate functions:
          FIRST:
          - ask
          LAST:
          - ask
        policy:
          type: tumblingTime
          window size: 1000
  emit:
    select: symbol, ask_FIRST, ask_LAST
  group by:
  - symbol

3. Distribute processed events

Overview of the definition

  • Provide a logical name for the distribution definition

  • Bind to the use case in this case it is streamingAnalyticsPublisher

  • Define one or more channels to receive events

  • The published event is created by mapping the internal Joule event to the domain type defined by the transform StockAnalyticRecordTransform implementation which will then be converted to Json

Example Kafka publish connection using a custom transformation

kafkaPublisher:
  name: kafka_analytics_view
  cluster address: joule-gs-redpanda-0:9092
  topic: analytics_view
  partitionKeys:
  - symbol
  serializer:
    transform: com.fractalworks.examples.banking.data.StockAnalyticRecordTransform
    key serializer: org.apache.kafka.common.serialization.IntegerSerializer
    value serializer: com.fractalworks.streams.transport.kafka.serializers.json.ObjectJsonSerializer

Deployment artefact

Now we bring together each deployment artefact (source, use case and sinks) to form the desired use case. A use case is formed by a single app.env file which references these files. This method of deployment enables you to simply switch out the source and sinks based upon your needs i.e. development, testing and production deployments

Binding file used to run a use case

use case:
  name: basic_twindow_analytics
  constraints:
    valid from: "2024-01-01T08:00:00.000Z"
    valid to: "2030-01-01T23:59:00.000Z"
  sources:
  - nasdaq_quotes_stream
  stream name: basic_tumbling_window_pipeline
  sinks:
  - kafka_analytics_view

Clone the project to a local directory by using the following command:

Install the required platform tools if you want to experiment with Joule SDK, see the document.

(Lightweight Kafka implementation)

Use this to access the console

If you choose to build the environment for development purposes this is done by simply running the below command, please ensure you have the correct build environment set out in the document.

Define either one or more event sources using provided .

Use case processing is defined as a pipeline of processing stages. Joule provides a set of OOTB processors, see documentation, along with a to enable developers to extend the platform capabilities.

Distribution of processed events can be as simple as to file, dashboard, or on to another streaming channel for another process to perform further processing. For this example we are using the Kafka sink connector. For further information on available sinks can be found .

getting started
setting up the environment
Redpanda
link
setting up the environment
data source connectors
SDK
here
here
Postman
here
Redpanda UI
use case file brings each descriptor file to form a single unit of deployment