Synthetic Data Augmentation with Copulas

Workflow

Synthetic Data Augmentation with Copulas

Version4.0Latest, created on

The University of Saskatchewan
Ph.D. in Interdisciplinary Studies

Created by: Carlos Enrique Diaz, MBM, P.Eng.
Email: carlos.diaz@usask.ca

Supervisor: Lori Bradford, Ph.D.
Email: lori.bradford@usask.ca

Description:

This workflow demonstrates how to assess the quality of synthetic data generated using the Synthetic Data (Copulas) component in KNIME. It uses the well-known Iris dataset as a reference.

Section 1: Original Data Analysis with 150 Observations

Loads and preprocesses the Iris dataset (150 rows).
Uses Linear Correlation and Statistics nodes to explore the original data’s structure and relationships.

Section 2: Synthetic Data with 500 Observations

Generates 500 synthetic rows using the Synthetic Data (Copulas) component.
Applies the same analysis nodes to compare the synthetic dataset with the original.

Section 3: Mixed Data Visualization

Allows visualization of a 3D plot of real and synthetic data for three features.

This workflow is a simple and effective way to visualize and compare the statistical quality of synthetic data using built-in KNIME nodes.

External resources

Loading deploymentsLoading ad hoc jobs

Legal

By using or downloading the workflow, you agree to our terms and conditions.