RAW-C: Relatedness of Ambiguous Words--in Context (A New Lexical   Resource for English)

Sean Trott; Benjamin Bergen

arXiv:2105.13266·cs.CL·May 28, 2021

RAW-C: Relatedness of Ambiguous Words--in Context (A New Lexical Resource for English)

Sean Trott, Benjamin Bergen

PDF

1 Repo

TL;DR

This paper introduces RAW-C, a new dataset of human relatedness judgments for ambiguous words in context, and evaluates how well contextualized embeddings reflect human perceptions of word meaning similarity.

Contribution

The paper presents RAW-C, a novel dataset for evaluating lexical ambiguity in context, and analyzes the correlation between human judgments and embeddings from BERT and ELMo.

Findings

01

Cosine distance in embeddings correlates with human judgments.

02

Embeddings underestimate similarity within the same sense.

03

Embeddings overestimate similarity between different senses.

Abstract

Most words are ambiguous--i.e., they convey distinct meanings in different contexts--and even the meanings of unambiguous words are context-dependent. Both phenomena present a challenge for NLP. Recently, the advent of contextualized word embeddings has led to success on tasks involving lexical ambiguity, such as Word Sense Disambiguation. However, there are few tasks that directly evaluate how well these contextualized embeddings accommodate the more continuous, dynamic nature of word meaning--particularly in a way that matches human intuitions. We introduce RAW-C, a dataset of graded, human relatedness judgments for 112 ambiguous words in context (with 672 sentence pairs total), as well as human estimates of sense dominance. The average inter-annotator agreement (assessed using a leave-one-annotator-out method) was 0.79. We then show that a measure of cosine distance, computed using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seantrott/raw-c
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Sigmoid Activation · Residual Connection · Layer Normalization · Tanh Activation · Softmax · Attention Dropout · Long Short-Term Memory