Quickstart

Get stream processing from your data sources to insights in less than 15 minutes


When you start working with Joule you will be editing files locally using a code editor and running projects using the Joule command scripts. If you prefer to build your projects within an Integrated Development Environment (IDE) clone the existing banking example project from here.

Download the banking example project here.

Just click, download, unzip and run the examples

Prerequisites

Getting started with Joule has minimal requirements to getting started but to take full advantage of the platform some key technical capabilities would be needed.

  • To use configure Joule it is important that you know some basics of the Terminal. In particular, you should understand general bash commands to navigate through the directory structure of your computer easily.

  • Install Joule using the installation instructions for your operating system.

  • Create a GitLab account if you don't already have one.

  • Install the required platform tools, see the setting up the environment document.

Add to the /etc/hosts file the address of the Kafka host

i.e. 127.0.0.1 KAFKA_BROKER

Getting started

We shall start with a simple use case that demonstrates how to get yourself running with the platform using the following repeatable steps.

The use case will subscribe to a Kafka quotes topic, get the high and low price per symbol using tumbling windows and then publish resulting event onto the analytics_view topic.

This project can be found on Gitlab by following this link.

1. Connect to a data source

Define either one or more event sources using provided data source connectors.

Overview of the definition

  • Provide a logical name for the source definition

  • Define one or more channels to receive events

  • Joule will subscribe to events using the quotes Kafka topic

  • Received events are deserialised using a user defined parser in to a Joule StreamEvent to enable processing

Example Kafka subscription

consumer:
  name: nasdaq_quotes_event_stream
  sources:
    - kafkaConsumer:
        name: nasdaq_quotes_stream
        cluster address: KAFKA_BROKER:19092
        consumerGroupId: nasdaq
        topics:
          - quotes

        deserializer:
          parser: com.fractalworks.examples.banking.data.QuoteToStreamEventParser
          key deserializer: org.apache.kafka.common.serialization.IntegerDeserializer
          value deserializer: com.fractalworks.streams.transport.kafka.serializers.object.ObjectDeserializer

2. Process events

Use case processing is defined as a pipeline of processing stages. Joule provides a set of OOTB processors, see documentation, along with a SDK to enable developers to extend the platform capabilities.

Overview of the definition

  • A logical name is defined for the use case, this will be used in the next step

  • Processing constraints define when this stream can execute

  • Event processing will use the actual event time provide within the received event

  • The use case will subscribe to events from the nasdaq_quotes_stream data source configured in step 1.

  • Event telemetry is switched on to track every event received and published

  • The use case applies 1 second tumbling window aggregate functions for two event attributes grouped by symbol

  • A simple event project emits the computed grouped events

Example tumbling window calculations

stream:
  name: quoteAnalyticsStreamProcessor
  validFrom: 2020-01-01
  validTo: 2025-12-31
  eventTimeType: EVENT_TIME
  sources: [ nasdaq_quotes_stream ]

  telemetry auditing:
    raw:
      clone events: true
      frequency: 10
    processed:
      clone events: false
      frequency: 10

  processing unit:
    pipeline:
      - time window:
          emitting type: tumblingQuoteAnalytics
          aggregate functions:
            FIRST: [ask]
            LAST: [ ask ]

          policy:
            type: tumblingTime
            window size: 1000

  emit:
    select: "symbol, ask_FIRST, ask_LAST"

  group by:
    - symbol

3. Distribute processed event

Distribution of processed events can be as simple as to file, dashboard, or on to another streaming channel for another process to perform further processing. For this example we are using the Kafka sink connector. For further information on available sinks can be found here.

Overview of the definition

  • Provide a logical name for the distribution definition

  • Bind to the use case in this case it is streamingAnalyticsPublisher

  • Define one or more channels to receive events

  • The published event is created by mapping the internal Joule event to the domain type defined by the transform StockAnalyticRecordTransform implemementation which will then be converted to Json

Example Kafka publish connection using a translation AVRO schema

publisher:
  name: streamingAnalyticsPublisher
  source: quoteAnalyticsStreamProcessor
  sinks:
    - kafkaPublisher:
        cluster address: KAFKA_BROKER:19092
        topic: analytics_view
        partitionKeys:
          - symbol

        serializer:
          transform: com.fractalworks.examples.banking.data.StockAnalyticRecordTransform
          key serializer: org.apache.kafka.common.serialization.IntegerSerializer
          value serializer: com.fractalworks.streams.transport.kafka.serializers.json.ObjectJsonSerializer

Deployment artefact

Now we bring together each deployment artefact (source, use case and sinks) to form the desired use case. A use case is formed by a single app.env file which references these files. This method of deployment enables you to simply switch out the source and sinks based upon your needs i.e. development, testing and production deployments

Example app.env file used by Joule to run a use case

JOULE_HOST_NAME=localhost
JOULE_JMX_PORT=1098
JOULE_MEMORY=-Xmx2G

SOURCEFILE=conf/sources/stockQuoteStream.yaml
ENGINEFILE=conf/usecases/baseTumblingWindows.yaml
PUBLISHFILE=conf/publishers/kafkaAnalytics.yaml

Get the example running

Joule has provided the necessary scripts and configurations on getting a use case running using either a Docker image or a local unpacked installation, we shall use the local installation to get you familiar with the general directory structure, this will benefit your understanding of the provided Docker image.

At the root of the directory we have the following structure:

├── META-INF
│   ├── joule
│   ├── services
│   └── simulator
├── bin
├── conf
│   ├── avro
│   ├── publishers
│   ├── simulator
│   ├── sources
│   └── usecases
├── infra
│   ├── confluent
│   ├── influx-grafana
│   ├── postgres
│   ├── rabbitmq
│   └── redpanda
├── data
│   └── csv
├── lib
└── userlibs

1. Start a local version of Redpanda Kafka

docker-compose -f infra/redpanda/docker-compose.yml up -d

2. Start the data simulator

This will generate simulated quotes based upon the provided nasdaq csv info file.

fractalworks-banking-example % ./bin/startQuotePublisher.sh 
Starting Quotes Simulator v1.1.0
ProcessId 11125

3. Start the Joule use case

This will use the app.env file to start the use case which will publish resulting analytic results on to the analytics_view topic

fractalworks-banking-example % ./bin/startJoule.sh
Joule Version: 1.1.0
Applying environment file. <FULL-PATH>/bin/app.env
Starting Joule
ProcessId 11634

4. View the results

docker exec -it redpanda-0 rpk topic consume analytics_view --num 1
{
  "topic": "analytics_view",
  "key": "\u0000\u00015L",
  "value": "{\"symbol\":\"PHD\",\"time\":1706625828336,\"askFirst\":15.420006518870125,\"askLast\":15.420006518870125}",
  "timestamp": 1706625828093,
  "partition": 0,
  "offset": 0
}

Or from the Redpanda UI

Use this link to access the console

5. Stopping the processes

fractalworks-banking-example % ./bin/stopQuotePublisher.sh 
Quote simulator closed

fractalworks-banking-example % ./bin/stopJoule.sh 
Joule processed closed

Last updated