Hub
Pricing About
ComponentComponent

Synthetic Data (Copulas)

Data augmentationCopulasSynthetic dataData generationCopula
+1
carlosenrique84 profile image
Version3.0Latest, created on 
Jun 11, 2025 5:50 PM
Drag & drop
Like
Use or download

The University of Saskatchewan

Ph.D. in Interdisciplinary Studies

Created by: Carlos Enrique Diaz, MBM, B.Eng.

Email: carlos.diaz@usask.ca

Supervisor: Lori Bradford, Ph.D.

Email: lori.bradford@usask.ca

Description:

This KNIME component generates synthetic tabular data using copula-based multivariate models, preserving both marginal distributions and inter-variable dependencies with the help of the Python copulas library.

In the Open View (F10), the component displays Spearman and Pearson correlograms for both the original and synthetic datasets, colour-coded from red (-1) to blue (1) for quick visual comparison.

Configuration Options:

Multivariate Distribution:

Choose between two copula-based modelling approaches:

  • Gaussian Copula

  • Vine Copula

Univariate Distribution (Only for Gaussian Copula):

Select the marginal distribution for each numeric column:

  • GaussianUnivariate (Default)

  • BetaUnivariate

  • GammaUnivariate

  • GaussianKDE

  • TruncatedGaussian

Vine Type (Only for Vine Copula):

Choose the vine structure:

  • Center

  • Regular

  • Direct

Synthetic Sample Size:

Number of synthetic rows to generate.

Deactivate Correlogram View for Faster Running:

Disables the interactive view to speed up large-scale or automated executions. Recommended to enable only during initial visual validation.

Numeric Columns:

Select the numeric features to model and synthesize.

Key Feature – Real-Value Substitution:

To enhance realism, each synthetic numeric value is replaced by the closest real value found in the original dataset.

  • This post-processing step ensures all values stay within domain-valid ranges.

  • The resulting table with real-value substitution is available in Port 1.

  • The raw synthetic data (possibly outside the original range) is available in Port 2.

Use Cases:

  • Data anonymization and privacy preservation

  • Machine learning pipeline testing

  • Prototyping with realistic mock data

  • Secure exploration of sensitive datasets

Requirements:

  • Python environment with the copulas library installed

  • R environment with the corrplot library installed

Component details

Input ports
  1. Type: Table
    Port 1
    No description available
Output ports
  1. Type: Table
    Port 1
    No description available
  2. Type: Table
    Port 2
    No description available

External resources

  • Workflow Example

Used extensions & nodes

Created with KNIME Analytics Platform version 5.4.2
  • Go to item
    KNIME Base nodesTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.4.1

    knime
  • Go to item
    KNIME ExpressionsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.4.1

    knime
  • Go to item
    KNIME Interactive R Statistics IntegrationTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.4.0

    knime
  • Go to item
    KNIME Python IntegrationTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.4.1

    knime
  • Go to item
    KNIME Quick FormsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.4.1

    knime
  • Go to item
    KNIME ViewsTrusted extension

    KNIME AG, Zurich, Switzerland

    Version 5.4.2

    knime

This component does not have nodes, extensions, nested components and related workflows

Legal

By using or downloading the component, you agree to our terms and conditions.

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits