Text clustering of Wikipedia articles. 12 different Wikipedia articles, three each on subjects of Philosophy, Religion, Law and Quantum-Mechanics were randomly selected, manually copied from Internet, saved into respective twelve text files (*.txt) in a folder. These twelve text files were then read, text-processed and finally hierachical clustering was performed. Clustering is perfect (even though files are just 12). At the lowest level in the dendogram articles on each subject first cluster together. Any distance measure other than 'cosine', reduces accuracy drastically.
Workflow
Text clustering of Wikipeidia articles
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.7.0
Legal
By using or downloading the workflow, you agree to our terms and conditions.