Performs a multinomial logistic regression. Select in the dialog a target column (combo box on top), i.e. the response. The solver combo box allows you to select which solver should be used for the problem (see below for details on the different solvers). The two lists in the center of the dialog allow you to include only certain columns which represent the (independent) variables. Make sure the columns you want to have included being in the right "include" list. See article in wikipedia about logistic regression for an overview about the topic.
Important Note on NormalizationThe SAG solver works best with z-score normalized data. That means that the columns are normalized to have zero mean and a standard deviation of one. This can be achieved by using a normalizer node before learning. If you have very sparse data (lots of zero values), this normalization will destroy the sparsity. In this case it is recommended to only normalize the dense features to exploit the sparsity during the calculations (SAG solver with lazy calculation). Note, however, that the normalization will lead to different coefficients and statistics of those (standard error, z-score, etc.). Hence if you want to use the learner for statistics (obtaining the mentioned statistics) rather than machine learning (obtaining a classifier), you should carefully consider if normalization makes sense for your task at hand. If the node outputs missing values for the parameter statistics, this is very likely caused by insufficient normalization and you will have to use the IRLS solver if you can't normalize your data.
SolversThe solver is the most important choice you make as it will dictate which algorithm is used to solve the problem.
- Iteratively reweighted least squares This solver uses an iterative optimization approach which is also sometimes termed Fisher's scoring, to calculate the model. It works well for small tables with only view columns but fails on larger tables. Note that it is the most error prone solver because it can't calculate a model if the data is linearly separable (see Potential Errors and Error Handling for more information). This solver is also not capable of dealing with tables where there are more columns than rows because it does not support regularization.
- Stochastic average gradient (SAG) This solver implements a variant of stochastic gradient descent which tends to converge considerably faster than vanilla stochastic gradient descent. For more information on the algorithm see the following paper . It works well for large tables and also tables with more columns than rows. Note that in the later case a regularization prior other than "uniform" must be selected. The default learning rate of 0.1 was selected because it often works well but ultimately the optimal learning rate always depends on the data and should be treated as a hyperparameter.
Learning Rate/Step Size StrategyOnly relevant for the SAG solver. The learning rate strategy provides the learning rates for the gradient descent. When selecting a learning rate strategy and initial learning rate keep in mind that there is always a trade off between the size of the learning rate and the number of epochs that are required to converge to a solution. With a smaller learning rate the solver will take longer to find a solution but if the learning rate is too large it might skip over the optimal solution and diverge in the worst case.
- Fixed The provided step size is used for the complete training. This strategy works well for the SAG solver, even if relatively large learning rates are used.
- Line Search Experimental learning rate strategy that tries to find the optimal learning rate for the SAG solver.
RegularizationThe SAG solver optimizes the problem using maximum a posteriori estimation which allows to specify a prior distribution for the coefficients of the resulting model. This form of regularization is the Bayesian version of other regularization approaches such as Ridge or LASSO. Currently the following priors are supported:
- Uniform This prior corresponds to no regularization at all and is the default. It essentially means that all values are equally likely for the coefficients.
- Gauss The coefficients are assumed to be normally distributed. This prior keeps the coefficients from becoming too large but does not force them to be zero. Using this prior is equivalent to using ridge regression (L2) with a lambda of 1/prior_variance.
- Laplace The coefficients are assumed to follow a Laplace or double exponential distribution. It tends to produce sparse solutions by forcing unimportant coefficients to be zero. It is therefore related to the LASSO (also known as L1 regularization).
Potential Errors and Error HandlingThe computation of the model is an iterative optimization process that requires some properties of the data set. This requires a reasonable distribution of the target values and non-constant, uncorrelated columns. While some of these properties are checked during the node execution you may still run into errors during the computation. The list below gives some ideas what might go wrong and how to avoid such situations.
- Insufficient Information This is the case when the data does not provide enough information about one or more target categories. Try to get more data or remove rows for target categories that may cause the error. If you are interested in a model for one target category make sure to group the target column before. For instance, if your data contains as target categories the values "A", "B", ..., "Z" but you are only interested in getting a model for class "A" you can use a rule engine node to convert your target into "A" and "not A".
- Violation of Independence Logistic Regression is based on the assumption of statistical independence. A common preprocessing step is to us a correlation filter to remove highly correlated learning columns. Use a "Linear Correlation" along with a "Correlation Filter" node to remove redundant columns, whereby often it's sufficient to compute the correlation model on a subset of the data only.
- Separation Please see this article about separation for more information.