Transform
Generate analytics-ready features from data
Objective
Feature engineering prepares raw data for analysis by creating new, insightful features:
Log transform Applies a log function to positive values, commonly used to handle skewed data.
Day of week transform Converts a date to its day of the week as a number (1-7).
Day binning Categorises a date as a weekday (1) or weekend (2).
Age binning Categorises ages into specified age ranges for easier analysis.
Each method produces targeted features, simplifying data for analytics.
Log transform
Log transformation is a data transformation method in which it replaces each variable x with a log(x) where x is a positive number and greater than zero
Example
Attributes schema
Attribute | Description | Type | Required |
---|---|---|---|
source field | The column to perform the calculation upon | Double |
Day of week transform
Provide the day of week from the passed date object to a number between 1 and 7, where start of week is Monday = 1.
Supported date objects:
java.time.LocalDate
java.sql.Date
org.joda.time.DateTime
Example
Attributes schema
Attribute | Description | Type | Required |
---|---|---|---|
source field | The column to perform the calculation upon | Double |
Day binning
Categorise a day into one of two categories following the Gregorian calendar.
Weekday (Mon-Fri) = 1
Weekends (Sat-Sun) = 2
Example
Attributes schema
Attribute | Description | Type | Required |
---|---|---|---|
source field | The column to perform the calculation upon | Double |
Age binning
Categorise a passed age in a pre-configured age bin as either an integer or date object.
Example
Attributes schema
Attribute | Description | Type | Required |
---|---|---|---|
bins | Array of age bins to use. Default bins are set to 0-9, 10-19,...110-119 | Int[][] | |
as date | Passed event field is a supported date object Supported Data classes:
| Boolean Default: false | |
base date | Provide a date which is used to calculate the age. Default set to the date process is started | String Format: YYYY-MM-DD | |
source field | The column to perform the calculation upon | Double |
Last updated