Joule
Search
K

Joule DSL Overview

The Joule platform provides a Domain Specific Language (DSL) to define use cases using human-readable yaml syntax.
Four functionally independent definitions form the Joule use case. These are outlined below with accompanying detailed documentation in further articles.

Use case definition

Use cases are declared by defining source data requirements, processing pipeline, and output requirements. Read the use case definition documentation for further information.

Example

The below example creates the min/max values for ask and bid values within a five-second tumbling window and only publishes symbols where they are not 'A'.
stream:
name: tumblingWindowQuoteStream
eventTimeType: EVENT_TIME
sources:
- nasdaq_quotes_stream
processing unit:
pipeline:
- time window:
emitting type: tumblingQuoteAnalytics
aggregate functions:
MIN: [ask, bid]
MAX: [ask, bid]
policy:
type: tumblingTime
windowSize: 5000
emit:
eventType: windowQuoteEvent
select: "symbol, ask_MIN, ask_MAX, bid_MIN, bid_MAX"
having: "symbol !='A'"
group by:
- symbol

Data subscription

Users are able to subscribe to external data events through the use of source connectors. For a detailed understanding of how to subscribe to data read the data subscription documentation and for a complete list of available connectors go to the sources catalog.

Example

The below example connects to a Kafka cluster, consumes events from the quote topic and transforms the received quote object into an internal StreamEvent object.
consumer:
name: nasdaq_quotes_stream
sources:
- kafkaConsumer:
name: nasdaq_quotes_stream
clusterAddress: localhost:9092
consumerGroupId: nasdaq
topics:
- quotes
deserializer:
transform: com.fractalworks.streams.examples.banking.data.QuoteToStreamEventTransformer
keyDeserializer: org.apache.kafka.common.serialization.IntegerSerializer
valueDeserializer: com.fractalworks.streams.transport.kafka.serializers.object.ObjectDeserializer

Event publishing

Users are able to publish events to downstream data platforms through the use of destination connectors. For a detailed understanding of how to publish data read the event publishing documentation and for a complete list of available connectors go to the destinations catalog.

Example

The below example generates a quoteWindowStream.csv file from the tumblingWindowQuoteStream events.
publisher:
name: filePublisher
source: tumblingWindowQuoteStream
sinks:
- file:
filename: quoteWindowsStream
path: ./data/output/analytics
batchSize: 1024
timeout: 1000
formatter:
csvFormatter:
contentType: "text/csv"
delimiter: "|"

Reference data

Often in stream processing additional data is required to perform analytics, generally known as reference data. Data of this form generally updates at a much slower pace and therefore is managed differently and held in data platform not architected for low latency reads. Joule has build a low latency read mechanism to overcome this limitation using in-memory caching. Further information can be found in the reference data documentation.

Example

The below example connects to a distributed caching platform, Apache Geode, for a low latency reference data reads.
platformDataStores:
name: bankingReferenceData
dataStores:
- geodeReferenceStores:
name: bankingReferenceData
locatorAddress: localhost
locatorPort: 10334
stores:
nasdaqIndex:
regionName: nasdaq-companies
keyClass : java.lang.String
holidays:
regionName: us-holidays
keyClass : java.lang.Integer