A Semantic approach for effective document clustering using WordNet

Leena H. Patil; Mohammed Atique

arXiv:1303.0489·cs.CL·March 5, 2013·5 cites

A Semantic approach for effective document clustering using WordNet

Leena H. Patil, Mohammed Atique

PDF

Open Access

TL;DR

This paper proposes a semantic document clustering method that leverages WordNet for better term selection and attribute reduction, improving clustering accuracy on various datasets.

Contribution

It introduces a novel approach combining WordNet with traditional preprocessing and term selection methods for enhanced document clustering.

Findings

01

Improved clustering accuracy demonstrated on multiple datasets.

02

Effective attribute reduction using WordNet enhances clustering performance.

03

Comparison shows superiority over baseline methods.

Abstract

Now a days, the text document is spontaneously increasing over the internet, e-mail and web pages and they are stored in the electronic database format. To arrange and browse the document it becomes difficult. To overcome such problem the document preprocessing, term selection, attribute reduction and maintaining the relationship between the important terms using background knowledge, WordNet, becomes an important parameters in data mining. In these paper the different stages are formed, firstly the document preprocessing is done by removing stop words, stemming is performed using porter stemmer algorithm, word net thesaurus is applied for maintaining relationship between the important terms, global unique words, and frequent word sets get generated, Secondly, data matrix is formed, and thirdly terms are extracted from the documents by using term selection approaches tf-idf, tf-df, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Web Data Mining and Analysis · Data Mining Algorithms and Applications