Hub
Pricing About
NodeNode / Manipulator

Correlation Concatenation

Scientific StrategyMarket Simulation
Drag & drop
Like

The Correlation Concatenation node is designed to take any number of Input Correlation Matrices and join them into a single Output Correlation Matrix. The user can specify the degree of Cross Correlation each Matrix will have with the others Matrices when they are joined.

Concatenating Correlation Matrices is useful when the Horizontal Differentiation of Features have been independently generated but some Correlation is known to exist between them. For example, if 'Style', 'Color', and 'Ambience' Features were independently generated, then the Correlation Concatenation node could join these three Features together with some Cross Correlation.

Often there is also a relationship between the Elements within each Matrix depending upon the position of the Element. For example, travelers who stay at a luxury hotel will typically appreciate every aspect of that luxury wherever it is found. Hence, travelers who value the best 'Room' are also more likely to value the best 'Entertainment' and the best 'Food'. The top Element found in each Matrix has more Cross Correlation than other Element combinations. Similarly, economy travelers who do not place a high value on a good 'Room' are also not likely to place a high value on 'Entertainment' and 'Food'.

Typically the Matrix:Matrix Correlation will be modest (less than 0.5). Large Matrix:Matrix Correlations will require the Output Correlation Matrix to be repaired (see the 'Output Correlation Repaired Matrix' and the 'Output Correlation Error Matrix'). If large Feature Correlations are required then consider using the Differentiation Horizontal node instead.

All of the row and column names must be unique across all input tables otherwise the Matrices cannot be joined. If a specific 'Order' is not provided in the Input Matrix then the row index is used for matching Elements.

More Help: Examples and sample workflows can be found at the Scientific Strategy website: www.scientificstrategy.com .

Node details

Input ports
  1. Type: Table
    Input Correlation Matrix A
    Input Correlation Matrix A : The first input set of Correlations that define the relationship between Customer Distributions of the same name. The Correlation Matrix must be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name should be unique among all three Input Correlation Matrices and correspond to a column of the same name. The Input Correlation Matrix should include the following columns:
    1. Distribution (string): The unique name of the Customer Distribution. This name should correspond to a column of the same name in the same Input Correlation Matrix. The Distribution column can have any name. If multiple string columns are found then the first column is treated as the Distribution name column and the other string columns are ignored. If no string columns are found then the RowID column is treated as the Distribution name column.
    2. Order (integer - optional): The specified Order of the Distribution used for matching Elements in other Correlation Matrices. If this Order is not provided then the row index will be used instead.
    3. Correlation Values (double): The correlation value between each Customer Distribution row and each Customer Distribution column. As the Correlation Matrix is expected to be symmetrical, each row-column value should be the same as each column-row value. If multiple correlations are provided for A:B or B:A then the highest-non-zero correlation will be used. Left-Lower or Right-Upper triangle matrices can also be used. The diagonal values should all be equal to 1.0.
  2. Type: Table
    Input Correlation Matrix B
    Input Correlation Matrix B (optional) : The second input set of Correlations that define the relationship between Customer Distributions of the same name. The Correlation Matrix must be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name should be unique among all three Input Correlation Matrices and correspond to a column of the same name. The Input Correlation Matrix should include the following columns:
    1. Distribution (string): The unique name of the Customer Distribution. This name should correspond to a column of the same name in the same Input Correlation Matrix. The Distribution column can have any name. If multiple string columns are found then the first column is treated as the Distribution name column and the other string columns are ignored. If no string columns are found then the RowID column is treated as the Distribution name column.
    2. Order (integer - optional): The specified Order of the Distribution used for matching Elements in other Correlation Matrices. If this Order is not provided then the row index will be used instead.
    3. Correlation Values (double): The correlation value between each Customer Distribution row and each Customer Distribution column. As the Correlation Matrix is expected to be symmetrical, each row-column value should be the same as each column-row value. If multiple correlations are provided for A:B or B:A then the highest-non-zero correlation will be used. Left-Lower or Right-Upper triangle matrices can also be used. The diagonal values should all be equal to 1.0.
Output ports
  1. Type: Table
    Output Correlation Matrix
    Output Correlation Matrix : The output set of correlations that define the relationship between Customer Distributions described in all three Input Correlation Matrices. The Output Correlation Matrix will be symmetrical such that the number of data rows match the number of columns. The Output Correlation Matrix will contain these columns:
    1. Distribution : Each unique row name found in the Input Correlation Matrices corresponding to a row Customer Distribution.
    2. Order : The Order each unique row Distribution was provided or found in the Input Correlation Matrix.
    3. Correlated Distributions : Each unique column name found in the Input Correlation Matrices, along with the degree of correlation to the row Customer Distribution. Output correlations will be symmetrical and range-limited to -1.0 and +1.0.
  2. Type: Table
    Output Correlation Repaired Matrix
    Output Correlation Repaired Matrix : The repaired output set of correlations that define the relationship between Customer Distributions described in all three Input Correlation Matrices. Repairing is required when the correlations are unrealistic. For example, if X is highly correlated to Y (for example, X:Y = +0.99) and if X is highly correlated with Z (for example, X:Z = +0.99) then Y must be highly correlated with Z (that is, Y:Z >> 0.0). More precisely, the Correlation Matrix must have all positive definite Eigenvalues. Note that it is not necessary for downstream nodes that generate Customer Distributions (such as the Matrix Distributions node or the Feature Generation node) to use this Correlation Repaired Matrix as these downstream nodes will always first self-repair the Input Correlation Matrix. The Output Correlation Repaired Matrix will contain the same columns as the Output Correlation Matrix:
    1. Distribution : Each unique row name found in the Input Correlation Matrices corresponding to a row Customer Distribution.
    2. Order : The Order each unique row Distribution was provided or found in the Input Correlation Matrix.
    3. Correlated Distributions : Each unique column name found in the Input Correlation Matrices, along with the repaired degree of correlation to the row Customer Distribution. Output correlations will be symmetrical and range-limited to -1.0 and +1.0.
  3. Type: Table
    Output Correlation Error Matrix
    Output Correlation Error Matrix : The difference between the Output Correlation Matrix and the Output Correlation Repaired Matrix. This is a convenience output to show how the Correlation Matrix needs to be repaired before Customer Distributions can be generated. The Output Correlation Error Matrix will contain the same columns as the Output Correlation Matrix:
    1. Distribution : Each unique row name found in the Input Correlation Matrices corresponding to a row Customer Distribution.
    2. Order : The Order each unique row Distribution was provided or found in the Input Correlation Matrix.
    3. Correlated Distributions : Each unique column name found in the Input Correlation Matrices, along with the difference between the output correlation and the repaired correlation.
Optional Input Ports (Dynamic Inport)
Input Correlation Matrix C (optional) : The third input set of Correlations that define the relationship between Customer Distributions of the same name. The Correlation Matrix must be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name should be unique among all three Input Correlation Matrices and correspond to a column of the same name. The Input Correlation Matrix should include the following columns:
  1. Distribution (string): The unique name of the Customer Distribution. This name should correspond to a column of the same name in the same Input Correlation Matrix. The Distribution column can have any name. If multiple string columns are found then the first column is treated as the Distribution name column and the other string columns are ignored. If no string columns are found then the RowID column is treated as the Distribution name column.
  2. Order (integer - optional): The specified Order of the Distribution used for matching Elements in other Correlation Matrices. If this Order is not provided then the row index will be used instead.
  3. Correlation Values (double): The correlation value between each Customer Distribution row and each Customer Distribution column. As the Correlation Matrix is expected to be symmetrical, each row-column value should be the same as each column-row value. If multiple correlations are provided for A:B or B:A then the highest-non-zero correlation will be used. Left-Lower or Right-Upper triangle matrices can also be used. The diagonal values should all be equal to 1.0.
  1. Type: Table

Extension

The Correlation Concatenation node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
  2. Go to item
  3. Go to item

KNIME
Open for Innovation

KNIME AG
Talacker 50
8001 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • Courses + Certification
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more about KNIME Business Hub
© 2025 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Data Processing Agreement
  • Credits