Bucketing

Apply individual variance tolerances to protect the identity of the originating data

Objective

The objective of this page is to explain how date and number values in a StreamEvent can be obfuscated using variance tolerances to protect sensitive information, such as date of birth, salary and age.

This method, similar to blurring, adjusts values within a specified range while preserving the original distribution and accuracy, ensuring privacy without compromising data utility.

This is more akin to blurring rather than obfuscation

Date variance

Each date value for a specified field will be varied by a random number of days, whilst maintaining the original variance, range and distribution.

This can useful where it would otherwise be possible to identify individuals by an exact match, such as date of birth.

Example

This code defines an obfuscation strategy named dateBucketing applied to the dateOfBirth field.

It uses date bucketing with a variance of 30, meaning that the actual date of birth will be obscured by randomly shifting the date within a 30-day range.

This protects the exact date while maintaining some level of accuracy.

obfuscation:
  name: dateBucketing
  fields:
    dateOfBirth:
      date bucketing:
        variance: 30

Attributes schema

Attribute

Description

Data Type

Required

variance

Maximum number of days to vary the source date

Integer

Default: 120

Number variance

Each number can be varied by a random percentage, whilst maintaining the original variance, range and distribution.

This can useful where it would otherwise be possible to identify individuals salary by an exact match.

Example

This code defines an obfuscation strategy called numberBucketing applied to two fields: salary and age.

salary The value of salary will be obscured by a variance of 0.25, meaning the value can fluctuate by 25% up or down.
age The value of age will be obscured by a variance of 0.10, meaning the age can fluctuate by 10% up or down.

This technique hides the exact values while maintaining general data accuracy within the specified variance.

obfuscation:
  name: numberBucketing
  fields:
    salary:
      number bucketing:
        variance: 0.25
    age: 
      number bucketing:
        variance: 0.10

Attributes schema

Attribute

Description

Data Type

Required

variance

Variance multiplier to be applied to random masking process

Double

Default: 0.15

PreviousMasking NextRedaction

Last updated 8 months ago

Was this helpful?