An improved semantic similarity measure for document clustering based on topic maps
Muhammad Rafi, Mohammad Shahid Shaikh

TL;DR
This paper introduces a new semantic similarity measure for document clustering that leverages topic maps to better capture the meaning and context of documents, outperforming traditional vector-based methods.
Contribution
It proposes a novel similarity measure based on topic maps, providing a more semantically aware approach for document clustering compared to existing methods.
Findings
The new measure outperforms traditional similarity measures in experiments.
Topic maps effectively capture semantic relationships in documents.
The approach improves clustering accuracy on text mining datasets.
Abstract
A major computational burden, while performing document clustering, is the calculation of similarity measure between a pair of documents. Similarity measure is a function that assigns a real number between 0 and 1 to a pair of documents, depending upon the degree of similarity between them. A value of zero means that the documents are completely dissimilar whereas a value of one indicates that the documents are practically identical. Traditionally, vector-based models have been used for computing the document similarity. The vector-based models represent several features present in documents. These approaches to similarity measures, in general, cannot account for the semantics of the document. Documents written in human languages contain contexts and the words used to describe these contexts are generally semantically related. Motivated by this fact, many researchers have proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Text and Document Classification Technologies · Topic Modeling
