Roget's Thesaurus and Semantic Similarity
Mario Jarmasz, and Stan Szpakowicz

TL;DR
This paper presents a system that measures semantic similarity using Roget's Thesaurus, compares it with WordNet-based methods, and evaluates its performance on noun pair similarity and synonym questions, showing competitive results.
Contribution
The paper introduces a novel semantic similarity measure based on Roget's Thesaurus and evaluates its effectiveness against established WordNet-based methods.
Findings
Roget's-based system achieves high correlation with human judgments (.878 and .818) on noun pair similarity.
The system correctly answers approximately 75-82% of synonym questions across different tests.
Results are comparable to or close to human performance in semantic similarity tasks.
Abstract
We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet-based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein and Goodenough's 65 pairs, which we have also studied. Our Roget's-based system gets correlations of .878 for the smaller and .818 for the larger list of noun pairs; this is quite close to the .885 that Resnik obtained when he employed humans to replicate the Miller and Charles experiment. We further evaluate our measure by using Roget's and WordNet to answer 80 TOEFL, 50 ESL and 300 Reader's Digest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
