A Semantic approach for effective document clustering using WordNet
Leena H. Patil, Mohammed Atique

TL;DR
This paper proposes a semantic document clustering method that leverages WordNet for better term selection and attribute reduction, improving clustering accuracy on various datasets.
Contribution
It introduces a novel approach combining WordNet with traditional preprocessing and term selection methods for enhanced document clustering.
Findings
Improved clustering accuracy demonstrated on multiple datasets.
Effective attribute reduction using WordNet enhances clustering performance.
Comparison shows superiority over baseline methods.
Abstract
Now a days, the text document is spontaneously increasing over the internet, e-mail and web pages and they are stored in the electronic database format. To arrange and browse the document it becomes difficult. To overcome such problem the document preprocessing, term selection, attribute reduction and maintaining the relationship between the important terms using background knowledge, WordNet, becomes an important parameters in data mining. In these paper the different stages are formed, firstly the document preprocessing is done by removing stop words, stemming is performed using porter stemmer algorithm, word net thesaurus is applied for maintaining relationship between the important terms, global unique words, and frequent word sets get generated, Secondly, data matrix is formed, and thirdly terms are extracted from the documents by using term selection approaches tf-idf, tf-df, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Web Data Mining and Analysis · Data Mining Algorithms and Applications
