Transform

Generate analytics-ready features from data

Objective

Feature engineering prepares raw data for analysis by creating new, insightful features:

  1. Log transform Applies a log function to positive values, commonly used to handle skewed data.

  2. Day of week transform Converts a date to its day of the week as a number (1-7).

  3. Day binning Categorises a date as a weekday (1) or weekend (2).

  4. Age binning Categorises ages into specified age ranges for easier analysis.

Each method produces targeted features, simplifying data for analytics.

Log transform

Log transformation is a data transformation method in which it replaces each variable x with a log(x) where x is a positive number and greater than zero

Example

feature engineering:
  ...
  features:
    compute:
      log_spend:
        function:
          log transform:
            source field: spend

Attributes schema

Attribute
Description
Type
Required

source field

The column to perform the calculation upon

Double

Day of week transform

Provide the day of week from the passed date object to a number between 1 and 7, where start of week is Monday = 1.

Supported date objects:

  1. java.time.LocalDate

  2. java.sql.Date

  3. org.joda.time.DateTime

Example

feature engineering:
  ...
  features:
    compute:
      day_of_week:
        function:
          day-of-week transform:
            source field: date

Attributes schema

Attribute
Description
Type
Required

source field

The column to perform the calculation upon

Double

Day binning

Categorise a day into one of two categories following the Gregorian calendar.

  1. Weekday (Mon-Fri) = 1

  2. Weekends (Sat-Sun) = 2

Example

feature engineering:
  ...
  features:
    compute:
      day_bin:
        function:
          day binning:
            source field: date

Attributes schema

Attribute
Description
Type
Required

source field

The column to perform the calculation upon

Double

Age binning

Categorise a passed age in a pre-configured age bin as either an integer or date object.

Example

feature engineering:
  ...
  features:
    compute:
        age_bin:
          function:
            age binning:
              bins: [ [0,18], [19,21], [22, 40], [41, 55], [56,76]]
              base date: 2023-01-01
              source field: current_age

Attributes schema

Attribute
Description
Type
Required

bins

Array of age bins to use. Default bins are set to 0-9, 10-19,...110-119

Int[][]

as date

Passed event field is a supported date object

Supported Data classes:

  • java.time.LocalDate

  • java.sql.Date

Boolean

Default: false

base date

Provide a date which is used to calculate the age. Default set to the date process is started

String

Format: YYYY-MM-DD

source field

The column to perform the calculation upon

Double

Last updated