Outer stream joins

Immediately pass the first event received and initialised downstream processors

Objective

Outer joins are different from inner joins by immediately passing the first event received on the outer stream into the processing pipeline, which can be useful for initialising downstream processors.

Any additional events that meet the join criteria are also emitted as they arrive.

Outer stream joins are extremely useful to tackle the cold start problem in stream recommendation engines.

Uses

It is extremely useful for ideal to prime processing.

Priming can improve performance by reducing the latency to get initial results, which are then refined or updated as more data becomes available.

What is priming?

In stream processing, priming refers to initialising or setting up a process to start working as soon as possible with partial or incomplete data, often before all necessary data has arrived.

For example, with an outer join in stream processing, as soon as an event arrives on the outer stream, it’s immediately processed and sent downstream. This priming action allows downstream processors to begin working with the initial data, while waiting for more events to join and complete the full picture.

It’s especially useful for processes that depend on getting an initial seed of data to start calculations, aggregations, or other operations quickly.

Example

When a new customerId event is generated by the siteVisits stream, the join processor emits it for initial processing.

Subsequent matching events trigger additional join outputs, continuing until the join expiration policy takes effect.

Outer joins are set by *=.

streams join:
  expression: "sitevisits.customerId *= webpage.adclicks.customerId"
  merge events: true
  left policy:
    time to live: 30 minutes
  right policy:
    delete on join: true

Last updated