This exercise goes through statistics, data distribution, outilers and different scales.
Steps:
read excel file boston house prices, available in folder data/
rid off the first index column
Check the data distribution of the prices (column MEDV)
Are there any outliers?
Check the values of the distribution for all columns and check if there are any missing values.
Learn about the basic statistics indicators
Remove rows containing outliers of MEDV and check it (to avoid having any outlier the IQR should be set up 1)
Use the box plot to check the outliers
Normalize your data by using z-score
check the box plots and statistics again (resulted standard deviation should be 1 for all columns)
Workflow
Statistics, data distribution, outliers and data normalization
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.2.1
Legal
By using or downloading the workflow, you agree to our terms and conditions.