Spark Missing Value

Node / Manipulator

Spark Missing Value

This node helps handle missing values found in the ingoing Spark data. The first tab in the dialog (labeled Default ) provides default handling options for all columns of a given type. These settings apply to all columns in the input table that are not explicitly mentioned in the second tab (labeled Individual ). This second tab permits individual settings for each available column (thus, overriding the default). To make use of this second approach, select a column or a list of columns which needs extra handling, click "Add", and set the parameters. Click on the label with the column name(s), will select all covered columns in the column list. To remove this extra handling (and instead use the default handling), click the "Remove" button for this column.

This node requires at least Apache Spark 2.0

Node details

Ports Options Views

Input ports

Type: Spark Data
Spark Data
Spark data with missing values

Output ports

Type: Spark Data
Spark Data
Spark data with replaced values
Type: PMML
PMML Model
PMML documenting the missing value replacements

Extension

The Spark Missing Value node is part of this extension:

Go to item