Bucketing
Apply individual variance tolerances to protect the identity of the originating data
Objective
The objective of this page is to explain how date and number values in a StreamEvent
can be obfuscated using variance tolerances to protect sensitive information, such as date of birth, salary and age.
This method, similar to blurring, adjusts values within a specified range while preserving the original distribution and accuracy, ensuring privacy without compromising data utility.
This is more akin to blurring rather than obfuscation
Date variance
Each date value for a specified field will be varied by a random number of days, whilst maintaining the original variance, range and distribution.
This can useful where it would otherwise be possible to identify individuals by an exact match, such as date of birth.
Example
This code defines an obfuscation strategy named dateBucketing
applied to the dateOfBirth
field.
It uses date bucketing with a variance of 30, meaning that the actual date of birth will be obscured by randomly shifting the date within a 30-day range.
This protects the exact date while maintaining some level of accuracy.
Attributes schema
Attribute | Description | Data Type | Required |
---|---|---|---|
variance | Maximum number of days to vary the source date | Integer Default: 120 |
Number variance
Each number can be varied by a random percentage, whilst maintaining the original variance, range and distribution.
This can useful where it would otherwise be possible to identify individuals salary by an exact match.
Example
This code defines an obfuscation strategy called numberBucketing
applied to two fields: salary
and age
.
salary The value of
salary
will be obscured by a variance of 0.25, meaning the value can fluctuate by 25% up or down.age The value of
age
will be obscured by a variance of 0.10, meaning the age can fluctuate by 10% up or down.
This technique hides the exact values while maintaining general data accuracy within the specified variance.
Attributes schema
Attribute | Description | Data Type | Required |
---|---|---|---|
variance | Variance multiplier to be applied to random masking process | Double Default: 0.15 |
Last updated