Joule
Search
K
Comment on page

Scaling

Feature scaling is a method used to normalize the range of independent variables or features of data. In data processing, it is also known as data normalisation.

Max Scaler

The Max Scaler sets the data between -1 and 1. It scales data according to the absolute maximum, so it is not suitable for outliers. It needs pre-processing like handling with outliers

Attributes

Attribute
Description
Type
Required
absolute max
The column absolute max of the feature
Double

Example

features:
compute:
scaled_price:
function:
max scaler:
source field: price
variables:
absolute max: 12.78

Min Max Scaler

Transform features by scaling each feature to a given range. This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g., between zero and one. This scaler shrinks the data within the range of -1 to 1 if there are negative values.
We can set the range like [0,1] or [0,5] or [-1,1]. This Scaler responds well if the standard deviation is small and when a distribution is not Gaussian and is sensitive to outliers.

Attributes

Attribute
Description
Type
Required
min
The column min of the feature
max
The column max of the feature
interval
Double array

Example

features:
compute:
scaled_price:
function:
minmax scaler:
source field: price
variables:
min: 10.00
max: 12.78

Robust Scaler

The Robust Scaler is a median-based scaling method. The formula of RobustScaler is (Xi-Xmedian) Xiqr, so it is not affected by outliers.
Since it uses the interquartile range, it absorbs the effects of outliers while scaling. The interquartile range (Q3 — Q1) has half the data point. If you have outliers that might affect your results or statistics and don’t want to remove them, RobustScaler is the best choice.

Attributes

Attribute
Description
Type
Required
median
The column min of the feature
Double
q1
The column Q1 interquartile range of the feature
Double
q3
The column Q3 interquartile range of the feature
Double
iqr
The calculated interquartile range difference of q3 and q1
Double

Example

features:
compute:
scaled_price:
function:
robust scaler:
source field: price
variables:
median: 8.78
q3: 11.78
q1: 7.67

Standard Scaler

The Standard Scaler assumes data is normally distributed within each feature and scales them such that the distribution centered around 0, with a standard deviation of 1.
Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. If data is not normally distributed, this is not the best Scaler to use.

Attributes

Attribute
Description
Type
Required
population mean
The column population mean of the feature
Double
population std
The column population standard deviation of the feature
Double

Example

features:
compute:
scaled_price:
function:
robust scaler:
source field: price
variables:
population mean: 11.15
population std: 1.48
Last modified 6mo ago