Using Information Content to Evaluate Semantic Similarity in a Taxonomy
Philip Resnik

TL;DR
This paper introduces an information content-based measure for semantic similarity in taxonomies, demonstrating superior correlation with human judgments compared to traditional methods.
Contribution
It proposes a novel information content-based similarity measure and empirically shows its improved performance over edge counting in taxonomy-based semantic similarity.
Findings
Correlation of r=0.79 with human judgments
Outperforms traditional edge counting (r=0.66)
Approaches the upper bound of human consistency (r=0.90)
Abstract
This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0.66).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
