Keyword Search

This component extracts the most relevant English keywords in a corpus (a collection of documents) using three specific techniques: - Topic Extraction using LDA: this technique collects a set of keywords for each different topic which clusters documents in different groups. - Term Co-Occurrence: this other technique finds pair of keywords which appear together often in different documents. - Max(TF-IDF) measure: a ranking which measures the importance of terms throughout the corpus. This component takes as input a column of Document type (from String to Document node) and it then identifies keywords in the corpus according to the hyper-parameters defined in configuration dialogue. The collected keywords are then provided in three tables at the output, one of each of the three techniques above. The component by default is applying basic text pre-processing (e.g. stopwords and symbols removal) based on the English language. This pre-processing can be deactivated via the dialogue and performed outside of the component when working with other or multiple languages.

Component details

Ports Options Views

Input ports

Type: Table
String input of Columns
This component requires input of text columns in String format.

Output ports

Type: Table
LDA Terms
Output of nouns, adjectives and verbs along with weights defined by LDA in a olumn.
Type: Table
Term Co-Occurrence count
Output of nouns, adjectives and verbs along with counts of terms occurring in corpus.
Type: Table
TF-IDF
Table output of terms with highest TF-IDF between all documents.

Legal

By using or downloading the component, you agree to our terms and conditions.

Component details

Input ports

Output ports

KNIME Base nodes

KNIME Javasnippet

KNIME Math Expression (JEP)

KNIME Quick Forms

KNIME Textprocessing

Legal

Keyword Search

Component details

Input ports

Output ports

Used extensions & nodes

KNIME Base nodesTrusted extension

KNIME JavasnippetTrusted extension

KNIME Math Expression (JEP)Trusted extension

KNIME Quick FormsTrusted extension

KNIME TextprocessingTrusted extension

Legal

KNIME Base nodes

KNIME Javasnippet

KNIME Math Expression (JEP)

KNIME Quick Forms

KNIME Textprocessing