Hub
  • Software
  • Blog
  • Forum
  • Events
  • Documentation
  • About KNIME
  • KNIME Hub
  • Nodes
  • Mahalanobis Distance
NodeNode / Manipulator

Mahalanobis Distance

Analytics Distance Calculation Distance Functions
Drag & drop
Like
Copy short link

The Mahalanobis Distance is a metric, which measures the distance of two data sets with respect to the variance and covariance of the selected variables.
It is defined as
d (x,y) = ((x-y) T S -1 (x-y)) 1/2
Whereby x and y are two random vectors on the same distribution with the convariance matrix S.

Explanation:
Assume that we have a data set in a 2-dimensional Euclidean space and we want to estimate the probability that a point P1 (x,y) is part of this set. Obviously, the 'closer' the P1 is to the center of mass in the set, the more likely it is contained. Also we have to consider the spread of the data. A Data set with correlated variables will form a ellipse around the center of mass in the 2-dimensional Euclidean space. So the probability that a test point is contained in the set is also depend on the direction of the axis of that ellipse - or ellipsoid in a N-dimensional Euclidean space. The ellipsoid that best represents the set's probability distribution can be estimated by building the covariance matrix of the samples, which is actually used by the Mahalanobis distance.
If the covariance matrix is the identity matrix the variables of the data set are not correlated and the Mahalanobis distance reduces to the Euclidean distance.

Use case:
A typical use case is the outlier detection. These are intuitively points with a very high Mahalanobis distance in contrast to points in the data set.

Node details

Input ports
  1. Type: Table
    Input Table
    Input data.
  2. Type: Table
    Covariance Matrix Table
    Optional covariance input table. The matrix must be quadratic and have identical column/row pairs. If unconnected the covariance matrix is computed on the selected input columns.
Output ports
  1. Type: Distance Measure
    Distance Measure
    The configured distance.
  2. Type: Table
    Covariance Matrix Table
    The computed covariance matrix.

Extension

The Mahalanobis Distance node is part of this extension:

  1. Go to item

Related workflows & nodes

  1. Go to item
    Basic example of Mahalanobis distance calculation
    Variances Distance Example
    This KNIME workflow demonstrates usage of the Mahalanobis Distance calculation. The calcu…
    lisovyi > Public > forum > 17887 Mahalanobis distance
  2. Go to item
    NIR Spectral Data -Inhouse Database Search
    Lab data NIR spectroscopy Spectroscopy
    +1
    This workflow shows how an inhouse database can be created directly from JDX files and se…
    knime > Life Sciences > Laboratory Data > NIR_Spectral_Data_Analysis > 03_NIR_Spectral_Data_Inhouse_Database_Search
  1. Go to item
  2. Go to item
  3. Go to item
  4. Go to item
  5. Go to item
  6. Go to item

KNIME
Open for Innovation

KNIME AG
Hardturmstrasse 66
8005 Zurich, Switzerland
  • Software
  • Getting started
  • Documentation
  • E-Learning course
  • Solutions
  • KNIME Hub
  • KNIME Forum
  • Blog
  • Events
  • Partner
  • Developers
  • KNIME Home
  • KNIME Open Source Story
  • Careers
  • Contact us
Download KNIME Analytics Platform Read more on KNIME Server
© 2022 KNIME AG. All rights reserved.
  • Trademarks
  • Imprint
  • Privacy
  • Terms & Conditions
  • Credits