Field Tokeniser

Tokenise attribute values in to component parts

Objective

Split complex aggregated event attributes in to independent attributes using custom field tokeniser plugins

Uses

The uses of this filter is very much depended upon context to which Joule is deployed within. As such it relies entirely upon custom implementations developed by business developers.

Listed below are various uses for this type filter:

  1. Split an address provided as a single string in to independent address components.

  2. Get a device code from a mobile IMEI code.

  3. Tokenise sentences ready for LLM processing

Example & DSL attributes

This example take an aggregated value of longitude and latitude and split it to independent fields and added to the StreamEvent object. This uses the custom plugin LatitudeLongitudeDecoder provided in the telco project, see code below.

tokenizer enricher:
  tokenizers:
    longlat : com.fractalworks.streams.examples.telco.enricher.LatitudeLongitudeDecoder

Attributes schema

Plugin code

The plugin code extracts a latitude and longitude string, splits it by the comma, and converts each part into separate numeric event attributes.

A FieldTokenizer API is provided for developers to build and deploy custom implementation.

import com.fractalworks.streams.sdk.referenceData.FieldTokenizer;
import java.util.HashMap;
import java.util.Map;
import java.util.Optional;

public class LatitudeLongitudeDecoder implements FieldTokenizer {

    public LatitudeLongitudeDecoder() {}

    @Override
    public Optional<Map<String, Object>> decode(Object o) {
        if( o instanceof String) {
            String[] co = ((String) o).split(",");
            Map<String, Object> map = new HashMap<>();
            map.put("latitude", Float.parseFloat(co[1]));
            map.put("longitude", Float.parseFloat(co[0]));
            return Optional.of(map);
        }
        return Optional.empty();
    }
}

Last updated