Stream Join

Join independent stream events to trigger advance analytical insights and dynamic business rules


The stream join operation within stream processing is core potent function. It reduces time to insight latency for complex event processing by correlating event streams in to a simple meaningful event flow. And thereby supports use case scenarios that require analysing and understanding data in real-time. This not to be confused with real-time data enrichment as that addresses the need for additional data that may have be computed within process or imported in as a static data set.

Example use cases

Here are some common use cases where stream joins are applied:

  1. Real-time ad performance monitoring: Gain real-time view on user ad clicks, impressions and user clicks within your website thereby enabling targeting ad spend and customer intelligence.

  2. Dynamic promotions: Recognise a new prospective customer on your site, track product clicks and apply a promotion rule to generate a targeted promotion for presentation on the current web page.

  3. Product recommendations: By joining user sessions, presented ads and site clicks a generate product recommendations and monitor reaction via a feedback loop.

  4. Customer 360 proactive support: Analyse multiple customer telemetry streams to generate next best support actions.

  5. Platform anomaly detection: Analyse and aggregate multiple device telemetry streams to determine compute and network utilisation abnormal loads.

streams join:
  expression: "sitevisit.customerId == webpage.adclicks.customerId"
  merge events: true
  left policy:
    time to live: 30 minutes
  right policy:
    delete on join: true

Using Inner stream Joins

When you need a strict join between two event streams an inner join is requires. This operator will only emitted an event on a successful join otherwise it will wait until a join has occurred, depending upon policay configuration.

streams join:
  expression: "sitevisit.customerId == webpage.adclicks.customerId"
  merge events: true
  left policy:
    time to live: 30 minutes
  right policy:
    delete on join: true

Using Outer stream Joins

Outer joins operate slightly differently to inner with respect to when the first event received on the outer stream. Whenever a new event on outer stream is received this immediately is passed on within the pipeline processing, (i.e. ideal to prime processing), and thereafter any events that cause a join are emitted.

streams join:
  expression: "sitevisits.customerId *= webpage.adclicks.customerId"
  merge events: true
  left policy:
    time to live: 30 minutes
  right policy:
    delete on join: true

For the above example when the sitevisits stream generates a new customerId event this will be emitted by the join processor for initial processing. Whereas further any joins will cause new join events to be emitting until join expiry policy is invoked.

Join Attributes

These attributes define how the join is performed with respect to the expression, how to create the emitted event structure etc,.

AttributeDescriptionData TypeRequired

expression

Simple join expression that evaluates two streams

String

merge events

Flag that either merge event attributes into a flattened event structure or places each event

Boolean Default: true

event type

User defined type of emitted event

String Default: left type - right type (i.e. typeA-typeB)

left / right join

See Join attribute section

Policy attribute Default

Policy Attributes

Fine grained control can be applied to how joins are handled through the use of the policy attribute. This becomes important for large and fast event stream with respect to memory and processing overhead.

AttributeDescriptionData TypeRequired

delete on join

Delete event from state management on a successful join

Boolean Default: false

time to live

Expire configuration for event stored within the state management system. Supported time units include nanoseconds, microseconds, milliseconds, seconds, hours and days

Formatted string Default: 60 minutes

Default configuration

delete on join is set to false and therefore will join events until the time to live setting is honoured

time to live is set for 60 minutes per stored event. This is refreshed every time a new event is received with the same join attribute value

Last updated