Transform

Generate analytics-ready features from data

Objective

Feature engineering prepares raw data for analysis by creating new, insightful features:

  1. Log transform Applies a log function to positive values, commonly used to handle skewed data.

  2. Day of week transform Converts a date to its day of the week as a number (1-7).

  3. Day binning Categorises a date as a weekday (1) or weekend (2).

  4. Age binning Categorises ages into specified age ranges for easier analysis.

Each method produces targeted features, simplifying data for analytics.

Log transform

Log transformation is a data transformation method in which it replaces each variable x with a log(x) where x is a positive number and greater than zero

Attributes schema

AttributeDescriptionTypeRequired

source field

The column to perform the calculation upon

Double

Example

feature engineering:
  ...
  features:
    compute:
      log_spend:
        function:
          log transform:
            source field: spend

Day of week transform

Provide the day of week from the passed date object to a number between 1 and 7, where start of week is Monday = 1.

Supported date objects:

  • java.time.LocalDate

  • java.sql.Date

  • org.joda.time.DateTime

Attributes schema

AttributeDescriptionTypeRequired

source field

The column to perform the calculation upon

Double

Example

feature engineering:
  ...
  features:
    compute:
      day_of_week:
        function:
          day-of-week transform:
            source field: date

Day binning

Categorise a day into one of two categories following the Gregorian calendar.

  1. Weekday (Mon-Fri) = 1.

  2. Weekends (Sat-Sun) = 2.

Attributes schema

AttributeDescriptionTypeRequired

source field

The column to perform the calculation upon

Double

Example

feature engineering:
  ...
  features:
    compute:
      day_bin:
        function:
          day binning:
            source field: date

Age binning

Categorise a passed age in a pre-configured age bin as either an integer or date object.

Attributes schema

AttributeDescriptionTypeRequired

bins

Array of age bins to use. Default bins are set to 0-9, 10-19,...110-119

Int[][]

as date

Passed event field is a supported date object

Supported Data classes:

  • java.time.LocalDate

  • java.sql.Date

Boolean

Default: false

base date

Provide a date which is used to calculate the age. Default set to the date process is started

String

Format: YYYY-MM-DD

source field

The column to perform the calculation upon

Double

Example

feature engineering:
  ...
  features:
    compute:
        age_bin:
          function:
            age binning:
              bins: [ [0,18], [19,21], [22, 40], [41, 55], [56,76]]
              base date: 2023-01-01
              source field: current_age

Last updated