Correlation Concatenation
The Correlation Concatenation node is designed to take up to three Input Correlation Matrices and join them into a single Output Correlation Matrix. The user can specify the degree of Correlation each Matrix will have with the other two Matrices when they are joined.
Concatenating Correlation Matrices is useful when the Horizontal Differentiation of Features have been independently generated but some Correlation is known to exist between them. For example, if 'style', 'color', and 'ambience' Features were independently generated, then the Correlation Concatenation node could join these three Features together with some cross-correlation.
All of the row and column names must be unique across all three tables otherwise the Matrices cannot be joined.
More Help: Examples and sample workflows can be found at the Scientific Strategy website: www.scientificstrategy.com.
Input Ports
- Type: Data Input Correlation Matrix A: The first input set of Correlations that define the relationship between Customer Distributions of the same name. The Correlation Matrix must be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name should be unique among all three Input Correlation Matrices and correspond to a column of the same name. The Input Correlation Matrix should include the following columns:
- Distribution (string): The unique name of the Customer Distribution. This name should correspond to a column of the same name in the same Input Correlation Matrix. The Distribution column can have any name. If multiple string columns are found then the first column is treated as the Distribution name column and the other string columns are ignored. If no string columns are found then the RowID column is treated as the Distribution name column.
- Correlation Values (double): The correlation value between each Customer Distribution row and each Customer Distribution column. As the Correlation Matrix is expected to be symmetrical, each row-column value should be the same as each column-row value. If multiple correlations are provided for A:B or B:A then the highest-non-zero correlation will be used. Left-Lower or Right-Upper triangle matrices can also be used. The diagonal values should all be equal to 1.0.
- Type: Data Input Correlation Matrix B (optional): The second input set of Correlations that define the relationship between Customer Distributions of the same name. The Correlation Matrix must be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name should be unique among all three Input Correlation Matrices and correspond to a column of the same name. The Input Correlation Matrix should include the following columns:
- Distribution (string): The unique name of the Customer Distribution. This name should correspond to a column of the same name in the same Input Correlation Matrix. The Distribution column can have any name. If multiple string columns are found then the first column is treated as the Distribution name column and the other string columns are ignored. If no string columns are found then the RowID column is treated as the Distribution name column.
- Correlation Values (double): The correlation value between each Customer Distribution row and each Customer Distribution column. As the Correlation Matrix is expected to be symmetrical, each row-column value should be the same as each column-row value. If multiple correlations are provided for A:B or B:A then the highest-non-zero correlation will be used. Left-Lower or Right-Upper triangle matrices can also be used. The diagonal values should all be equal to 1.0.
- Type: Data Input Correlation Matrix C (optional): The third input set of Correlations that define the relationship between Customer Distributions of the same name. The Correlation Matrix must be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name should be unique among all three Input Correlation Matrices and correspond to a column of the same name. The Input Correlation Matrix should include the following columns:
- Distribution (string): The unique name of the Customer Distribution. This name should correspond to a column of the same name in the same Input Correlation Matrix. The Distribution column can have any name. If multiple string columns are found then the first column is treated as the Distribution name column and the other string columns are ignored. If no string columns are found then the RowID column is treated as the Distribution name column.
- Correlation Values (double): The correlation value between each Customer Distribution row and each Customer Distribution column. As the Correlation Matrix is expected to be symmetrical, each row-column value should be the same as each column-row value. If multiple correlations are provided for A:B or B:A then the highest-non-zero correlation will be used. Left-Lower or Right-Upper triangle matrices can also be used. The diagonal values should all be equal to 1.0.
Output Ports
- Type: Data Output Correlation Matrix: The output set of correlations that define the relationship between Customer Distributions described in all three Input Correlation Matrices. The Output Correlation Matrix will be symmetrical such that the number of data rows match the number of columns. The Output Correlation Matrix will contain these columns:
- Distribution: Each unique row name found in the Input Correlation Matrices corresponding to a row Customer Distribution.
- Correlated Distributions: Each unique column name found in the Input Correlation Matrices, along with the degree of correlation to the row Customer Distribution. Output correlations will be symmetrical and range-limited to -1.0 and +1.0.
- Type: Data Output Correlation Repaired Matrix: The repaired output set of correlations that define the relationship between Customer Distributions described in all three Input Correlation Matrices. Repairing is required when the correlations are unrealistic. For example, if X is highly correlated to Y (for example, X:Y = +0.99) and if X is highly correlated with Z (for example, X:Z = +0.99) then Y must be highly correlated with Z (that is, Y:Z >> 0.0). More precisely, the Correlation Matrix must have all positive definite Eigenvalues. Note that it is not necessary for downstream nodes that generate Customer Distributions (such as the Matrix Distributions node or the Feature Generation node) to use this Correlation Repaired Matrix as these downstream nodes will always first self-repair the Input Correlation Matrix. The Output Correlation Repaired Matrix will contain the same columns as the Output Correlation Matrix:
- Distribution: Each unique row name found in the Input Correlation Matrices corresponding to a row Customer Distribution.
- Correlated Distributions: Each unique column name found in the Input Correlation Matrices, along with the repaired degree of correlation to the row Customer Distribution. Output correlations will be symmetrical and range-limited to -1.0 and +1.0.
- Type: Data Output Correlation Error Matrix: The difference between the Output Correlation Matrix and the Output Correlation Repaired Matrix. This is a convenience output to show how the Correlation Matrix needs to be repaired before Customer Distributions can be generated. The Output Correlation Error Matrix will contain the same columns as the Output Correlation Matrix:
- Distribution: Each unique row name found in the Input Correlation Matrices corresponding to a row Customer Distribution.
- Correlated Distributions: Each unique column name found in the Input Correlation Matrices, along with the difference between the output correlation and the repaired correlation.
Extension
This node is part of the extension
Market Simulation nodes by Scientific Strategy for KNIME - Community Edition
v4.0.0Short Link
Drag node into KNIME Analytics Platform