A Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity
Michael Mc Hale (Air Force Research Laboratory)

TL;DR
This paper evaluates the effectiveness of Roget's Thesaurus as a taxonomy for semantic similarity measurement, comparing it with WordNet and traditional edge counting methods, showing promising results close to human judgments.
Contribution
It introduces the use of Roget's Thesaurus for semantic similarity measurement and compares its performance with existing methods like WordNet.
Findings
Edge counting with Roget's achieves a correlation of r=0.88 with human judgments.
Roget's Thesaurus performs comparably to WordNet in semantic similarity tasks.
Traditional edge counting is surprisingly effective for measuring semantic similarity.
Abstract
This paper presents the results of using Roget's International Thesaurus as the taxonomy in a semantic similarity measurement task. Four similarity metrics were taken from the literature and applied to Roget's The experimental evaluation suggests that the traditional edge counting approach does surprisingly well (a correlation of r=0.88 with a benchmark set of human similarity judgements, with an upper bound of r=0.90 for human subjects performing the same task.)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
