Learns a generative label model from the provided label source columns. This node is a key component for the realization of weak supervision approaches as popularized by Snorkel . The idea in weak supervision is that it is often possible to create a number of simple inaccurate models (e.g. simple rules or existing models for slightly different tasks) that can label unlabeled data and that the agreements and disagreements of these simple models can be analyzed to infer information on the true label. Our implementation is a TensorFlow based adaptation of the matrix completion approach proposed in this paper by the Snorkel team. We refer to the publication for details on the strategy.
- Type: TableLabel SourcesTable containing label sources. A label source is either a nominal or a probability distribution column. Note that missing values in a label source are interpreted as abstains i.e. it is assumed that a missing value indicates that the label source did decide not to label the corresponding row. In case of nominal columns, label sources without a set of possible values assigned are ignored during the computation and a corresponding warning is displayed on the node.