Using Information Content to Evaluate Semantic Similarity in a Taxonomy

Philip Resnik

arXiv:cmp-lg/9511007·cmp-lg·February 3, 2008·2.2k cites

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

Philip Resnik

PDF

Open Access 2 Repos

TL;DR

This paper introduces an information content-based measure for semantic similarity in taxonomies, demonstrating superior correlation with human judgments compared to traditional methods.

Contribution

It proposes a novel information content-based similarity measure and empirically shows its improved performance over edge counting in taxonomy-based semantic similarity.

Findings

01

Correlation of r=0.79 with human judgments

02

Outperforms traditional edge counting (r=0.66)

03

Approaches the upper bound of human consistency (r=0.90)

Abstract

This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0.66).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques