The University of Saskatchewan
Ph.D. in Interdisciplinary Studies
Created by: Carlos Enrique Diaz, MBM, P.Eng.
Email: carlos.diaz@usask.ca
Supervisor: Lori Bradford, Ph.D.
Email: lori.bradford@usask.ca
Description:
This workflow demonstrates how to assess the quality of synthetic data generated using the Synthetic Data (Copulas) component in KNIME. It uses the well-known Iris dataset as a reference.
Section 1: Original Data Analysis with 150 Observations
Loads and preprocesses the Iris dataset (150 rows).
Uses Linear Correlation and Statistics nodes to explore the original data’s structure and relationships.
Section 2: Synthetic Data with 500 Observations
Generates 500 synthetic rows using the Synthetic Data (Copulas) component.
Applies the same analysis nodes to compare the synthetic dataset with the original.
Section 3: Mixed Data Visualization
Allows visualization of a 3D plot of real and synthetic data for three features.
This workflow is a simple and effective way to visualize and compare the statistical quality of synthetic data using built-in KNIME nodes.