This simple workflow provides an example on how to use the verified component “Conditional Density Plot”. A general question it helps to answer when trying to find a good predictor for Y from feature X:
"Is X a good feature?"
A part of this question can be answered by answering instead:
"Does the distribution of X change by partitioning with class Y?".
Because if it does, X might be good predictor for Y. If it does too well (i.e. 100%), data leakage might be present.
Examples analysis is done on the titanic dataset. Each row of the data stands for a passenger, with columns like name, sex, age, booked class, fare, ... and whether he or she survived the incident. The task is to identify a feature that can act as a good predictor whether someone survived.
Drag and drop the component in another workflow from its KNIME Hub page (link below) to visualize a different dataset. Apply your custom settings by opening the component dialogue. More information is available in the component description.
Note that a Conditional Box Plot can be used for similar, less detailed analysis. The Conditional Density Plot offers more insight on the distribution itself.
Workflow
Conditional Density Plot Component Example on the Titanic Dataset
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.4.2
Legal
By using or downloading the workflow, you agree to our terms and conditions.