Rethinking Word Similarity: Semantic Similarity through Classification   Confusion

Kaitlyn Zhou; Haishan Gao; Sarah Chen; Dan Edelstein; Dan Jurafsky,; Chen Shani

arXiv:2502.05704·cs.CL·February 11, 2025

Rethinking Word Similarity: Semantic Similarity through Classification Confusion

Kaitlyn Zhou, Haishan Gao, Sarah Chen, Dan Edelstein, Dan Jurafsky,, Chen Shani

PDF

Open Access 1 Video

TL;DR

This paper introduces Word Confusion, a novel similarity measure based on classifier confusion that captures the dynamic, context-dependent, and asymmetrical nature of semantic similarity, outperforming traditional cosine methods in various tasks.

Contribution

The paper proposes Word Confusion, a classifier-based similarity measure that incorporates dynamic features and context, offering a more nuanced understanding of semantic similarity than existing methods.

Findings

01

Comparable to cosine similarity in matching human judgments

02

Able to incorporate predetermined features of interest

03

Effectively captures meaning change over time

Abstract

Word similarity has many applications to social science and cultural analytics tasks like measuring meaning change over time and making sense of contested terms. Yet traditional similarity methods based on cosine similarity between word embeddings cannot capture the context-dependent, asymmetrical, polysemous nature of semantic similarity. We propose a new measure of similarity, Word Confusion, that reframes semantic similarity in terms of feature-based classification confusion. Word Confusion is inspired by Tversky's suggestion that similarity features be chosen dynamically. Here we train a classifier to map contextual embeddings to word identities and use the classifier confusion (the probability of choosing a confounding word c instead of the correct target word t) as a measure of the similarity of c and t. The set of potential confounding words acts as the chosen features. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Rethinking Word Similarity: Semantic Similarity through Classification Confusion· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling