Joule
  • Welcome to Joule's Docs
  • Why Joule?
    • Joule capabilities
  • What is Joule?
    • Key features
    • The tech stack
  • Use case enablement
    • Use case building framework
  • Concepts
    • Core concepts
    • Low code development
    • Unified execution engine
    • Batch and stream processing
    • Continuous metrics
    • Key Joule data types
      • StreamEvent object
      • Contextual data
      • GeoNode
  • Tutorials
    • Getting started
    • Build your first use case
    • Stream sliding window quote analytics
    • Advanced tutorials
      • Custom missing value processor
      • Stateless Bollinger band analytics
      • IoT device control
  • FAQ
  • Glossary
  • Components
    • Pipelines
      • Use case anatomy
      • Data priming
        • Types of import
      • Processing unit
      • Group by
      • Emit computed events
      • Telemetry auditing
    • Processors
      • Common attributes
      • Filters
        • By type
        • By expression
        • Send on delta
        • Remove attributes
        • Drop all events
      • Enrichment
        • Key concepts
          • Anatomy of enrichment DSL
          • Banking example
        • Metrics
        • Dynamic contextual data
          • Caching architecture
        • Static contextual data
      • Transformation
        • Field Tokeniser
        • Obfuscation
          • Encryption
          • Masking
          • Bucketing
          • Redaction
      • Triggers
        • Change Data Capture
        • Business rules
      • Stream join
        • Inner stream joins
        • Outer stream joins
        • Join attributes & policy
      • Event tap
        • Anatomy of a Tap
        • SQL Queries
    • Analytics
      • Analytic tools
        • User defined analytics
          • Streaming analytics example
          • User defined analytics
          • User defined scripts
          • User defined functions
            • Average function library
        • Window analytics
          • Tumbling window
          • Sliding window
          • Aggregate functions
        • Analytic functions
          • Stateful
            • Exponential moving average
            • Rolling Sum
          • Stateless
            • Normalisation
              • Absolute max
              • Min max
              • Standardisation
              • Mean
              • Log
              • Z-Score
            • Scaling
              • Unit scale
              • Robust Scale
            • Statistics
              • Statistic summaries
              • Weighted moving average
              • Simple moving average
              • Count
            • General
              • Euclidean
        • Advanced analytics
          • Geospatial
            • Entity geo tracker
            • Geofence occupancy trigger
            • Geo search
            • IP address resolver
            • Reverse geocoding
            • Spatial Index
          • HyperLogLog
          • Distinct counter
      • ML inferencing
        • Feature engineering
          • Scripting
          • Scaling
          • Transform
        • Online predictive analytics
        • Model audit
        • Model management
      • Metrics engine
        • Create metrics
        • Apply metrics
        • Manage metrics
        • Priming metrics
    • Contextual data
      • Architecture
      • Configuration
      • MinIO S3
      • Apache Geode
    • Connectors
      • Sources
        • Kafka
          • Ingestion
        • RabbitMQ
          • Further RabbitMQ configurations
        • MQTT
          • Topic wildcards
          • Session management
          • Last Will and Testament
        • Rest endpoints
        • MinIO S3
        • File watcher
      • Sinks
        • Kafka
        • RabbitMQ
          • Further configurations
        • MQTT
          • Persistent messaging
          • Last Will and Testament
        • SQL databases
        • InfluxDB
        • MongoDB
        • Geode
        • WebSocket endpoint
        • MinIO S3
        • File transport
        • Slack
        • Email
      • Serialisers
        • Serialisation
          • Custom transform example
          • Formatters
        • Deserialisers
          • Custom parsing example
    • Observability
      • Enabling JMX for Joule
      • Meters
      • Metrics API
  • DEVELOPER GUIDES
    • Setting up developer environment
      • Environment setup
      • Build and deploy
      • Install Joule
        • Install Docker demo environment
        • Install with Docker
        • Install from source
        • Install Joule examples
    • Joulectl CLI
    • API Endpoints
      • Mangement API
        • Use case
        • Pipelines
        • Data connectors
        • Contextual data
      • Data access API
        • Query
        • Upload
        • WebSocket
      • SQL support
    • Builder SDK
      • Connector API
        • Sources
          • StreamEventParser API
        • Sinks
          • CustomTransformer API
      • Processor API
      • Analytics API
        • Create custom metrics
        • Define analytics
        • Windows API
        • SQL queries
      • Transformation API
        • Obfuscation API
        • FieldTokenizer API
      • File processing
      • Data types
        • StreamEvent
        • ReferenceDataObject
        • GeoNode
    • System configuration
      • System properties
  • Deployment strategies
    • Deployment Overview
    • Single Node
    • Cluster
    • GuardianDB
    • Packaging
      • Containers
      • Bare metal
  • Product updates
    • Public Roadmap
    • Release Notes
      • v1.2.0 Join Streams with stateful analytics
      • v1.1.0 Streaming analytics enhancements
      • v1.0.4 Predictive stream processing
      • v1.0.3 Contextual SQL based metrics
    • Change history
Powered by GitBook
On this page
  • Objective
  • Date variance
  • Example
  • Attributes schema
  • Number variance
  • Example
  • Attributes schema

Was this helpful?

  1. Components
  2. Processors
  3. Transformation
  4. Obfuscation

Bucketing

Apply individual variance tolerances to protect the identity of the originating data

Objective

The objective of this page is to explain how date and number values in a StreamEvent can be obfuscated using variance tolerances to protect sensitive information, such as date of birth, salary and age.

This method, similar to blurring, adjusts values within a specified range while preserving the original distribution and accuracy, ensuring privacy without compromising data utility.

This is more akin to blurring rather than obfuscation

Date variance

Each date value for a specified field will be varied by a random number of days, whilst maintaining the original variance, range and distribution.

This can useful where it would otherwise be possible to identify individuals by an exact match, such as date of birth.

Example

This code defines an obfuscation strategy named dateBucketing applied to the dateOfBirth field.

It uses date bucketing with a variance of 30, meaning that the actual date of birth will be obscured by randomly shifting the date within a 30-day range.

This protects the exact date while maintaining some level of accuracy.

obfuscation:
  name: dateBucketing
  fields:
    dateOfBirth:
      date bucketing:
        variance: 30

Attributes schema

Attribute
Description
Data Type
Required

variance

Maximum number of days to vary the source date

Integer

Default: 120

Number variance

Each number can be varied by a random percentage, whilst maintaining the original variance, range and distribution.

This can useful where it would otherwise be possible to identify individuals salary by an exact match.

Example

This code defines an obfuscation strategy called numberBucketing applied to two fields: salary and age.

  1. salary The value of salary will be obscured by a variance of 0.25, meaning the value can fluctuate by 25% up or down.

  2. age The value of age will be obscured by a variance of 0.10, meaning the age can fluctuate by 10% up or down.

This technique hides the exact values while maintaining general data accuracy within the specified variance.

obfuscation:
  name: numberBucketing
  fields:
    salary:
      number bucketing:
        variance: 0.25
    age: 
      number bucketing:
        variance: 0.10

Attributes schema

Attribute
Description
Data Type
Required

variance

Variance multiplier to be applied to random masking process

Double

Default: 0.15

PreviousMaskingNextRedaction

Last updated 6 months ago

Was this helpful?