Text clustering of Wikipeidia articles
Text clustering of Wikipedia articles. 12 different Wikipedia articles, three each on subjects of Philosophy, Religion, Law and Quantum-Mechanics were randomly selected, manually copied from Internet, saved into respective twelve text files (*.txt) in a folder. These twelve text files were then read, text-processed and finally hierachical clustering was performed. Clustering is perfect (even though files are just 12). At the lowest level in the dendogram articles on each subject first cluster together. Any distance measure other than 'cosine', reduces accuracy drastically.