TL;DR
This paper introduces RAW-C, a new dataset of human relatedness judgments for ambiguous words in context, and evaluates how well contextualized embeddings reflect human perceptions of word meaning similarity.
Contribution
The paper presents RAW-C, a novel dataset for evaluating lexical ambiguity in context, and analyzes the correlation between human judgments and embeddings from BERT and ELMo.
Findings
Cosine distance in embeddings correlates with human judgments.
Embeddings underestimate similarity within the same sense.
Embeddings overestimate similarity between different senses.
Abstract
Most words are ambiguous--i.e., they convey distinct meanings in different contexts--and even the meanings of unambiguous words are context-dependent. Both phenomena present a challenge for NLP. Recently, the advent of contextualized word embeddings has led to success on tasks involving lexical ambiguity, such as Word Sense Disambiguation. However, there are few tasks that directly evaluate how well these contextualized embeddings accommodate the more continuous, dynamic nature of word meaning--particularly in a way that matches human intuitions. We introduce RAW-C, a dataset of graded, human relatedness judgments for 112 ambiguous words in context (with 672 sentence pairs total), as well as human estimates of sense dominance. The average inter-annotator agreement (assessed using a leave-one-annotator-out method) was 0.79. We then show that a measure of cosine distance, computed using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Sigmoid Activation · Residual Connection · Layer Normalization · Tanh Activation · Softmax · Attention Dropout · Long Short-Term Memory
