BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology

Luke Gessler; Nathan Schneider

arXiv:2109.09780·cs.CL·September 22, 2021

BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology

Luke Gessler, Nathan Schneider

PDF

Open Access 1 Repo

TL;DR

This paper evaluates how well BERT and similar models represent different word senses, especially rare ones, by analyzing their embedding neighborhoods without explicit sense supervision, revealing significant variability among models.

Contribution

It introduces a neighborhood-based retrieval method to assess sense representation in CWE models, highlighting differences in performance, particularly for uncommon senses.

Findings

01

CWE models outperform random baselines on sense ranking.

02

Performance varies significantly among models, especially for rare senses.

03

Models differ in their ability to approximate word senses without supervision.

Abstract

An important question concerning contextualized word embedding (CWE) models like BERT is how well they can represent different word senses, especially those in the long tail of uncommon senses. Rather than build a WSD system as in previous work, we investigate contextualized embedding neighborhoods directly, formulating a query-by-example nearest neighbor retrieval task and examining ranking performance for words and senses in different frequency bands. In an evaluation on two English sense-annotated corpora, we find that several popular CWE models all outperform a random baseline even for proportionally rare senses, without explicit sense supervision. However, performance varies considerably even among models with similar architectures and pretraining regimes, with especially large differences for rare word senses, revealing that CWE models are not all created equal when it comes to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lgessler/bert-has-uncommon-sense
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Dense Connections · Multi-Head Attention · Softmax · Linear Warmup With Linear Decay · Dropout · Attention Dropout · Weight Decay