Synthetic Data Generator (Numeric)

Component

Synthetic Data Generator (Numeric)

Versionv1.0Latest, created on

This component generates synthetic values into a numeric column by sampling from a selected distribution (Uniform/Gaussian/Gamma) where the distribution parameters have been defined from the original column. It’s also possible to generate synthetic data from separate distributions for different subsets (dependency groups) of the data as defined based on one or more dependency columns. The synthetic value of each row in the original data can be recognized by the row ID. In addition, it is possible to exclude dependency groups with too few examples from the data generation, and add random noise to the synthetic data. Synthetic data generation is used, for example, when the original data is confidential (anonymization) or difficult or expensive to collect.

Component details

Ports Options Views

Input ports

Type: Table
Original Data
The original numeric column and possibly dependency columns

Output ports

Type: Table
Synthetic Data
The synthetic numeric column together with the original row IDs
Type: Table
Statistics Table
The distribution parameters of the original numeric column, possibly separately for the different dependency groups
Type: Table
Input Data Dependency Group Mapping
The dependency group ID of each row ID in the original data

Legal

By using or downloading the component, you agree to our terms and conditions.