Logistic Regression Learner (deprecated)

Performs a multinomial logistic regression. Select in the dialog a target column (combo box on top), i.e. the response. The two lists in the center of the dialog allow you to include only certain columns which represent the (independent) variables. Make sure the columns you want to have included being in the right "include" list. See article in wikipedia about logistic regression for an overview about the topic. This particular implementation uses an iterative optimization procedure termed Fisher's scoring in order to compute the model.
If the optional PMML inport is connected and contains preprocessing operations in the TransformationDictionary those are added to the learned model.

Potential Errors and Error Handling

The computation of the model is an iterative optimization process that requires some properties of the data set. This requires a reasonable distribution of the target values and non-constant, uncorrelated columns. While some of these properties are checked during the node execution you may still run into errors during the computation. The list below gives some ideas what might go wrong and how to avoid such situations.

Insufficient Information This is the case when the data does not provide enough information about one or more target categories. Try to get more data or remove rows for target categories that may cause the error. If you are interested in a model for one target category make sure to group the target column before. For instance, if your data contains as target categories the values "A", "B", ..., "Z" but you are only interested in getting a model for class "A" you can use a rule engine node to convert your target into "A" and "not A".
Violation of Independence Logistic Regression is based on the assumption of statistical independence. A common preprocessing step is to us a correlation filter to remove highly correlated learning columns. Use a "Linear Correlation" along with a "Correlation Filter" node to remove redundant columns, whereby often it's sufficient to compute the correlation model on a subset of the data only.
Separation Please see this article about separation for more information.

Node details

Ports Options Views

Input ports

Type: Table
Input data
Table on which to perform regression. The input must not contain missing values, you have to fix them by e.g. using the Missing Values node.

Output ports

Type: PMML
Model for Predictor
Model to connect to a predictor node.
Type: Table
Coefficients and Statistics
Coefficients and statistics of the logistic regression model.

Extension

The Logistic Regression Learner (deprecated) node is part of this extension:

Go to item