Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with   Contextualized Embeddings

Gregor Wiedemann; Steffen Remus; Avi Chawla; Chris Biemann

arXiv:1909.10430·cs.CL·October 2, 2019·121 cites

Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings

Gregor Wiedemann, Steffen Remus, Avi Chawla, Chris Biemann

PDF

Open Access 1 Repo

TL;DR

This paper investigates the effectiveness of contextualized word embeddings, especially BERT, for word sense disambiguation, demonstrating that BERT can distinguish different senses of polysemic words better than other models.

Contribution

It introduces a simple nearest neighbor approach for WSD using CWEs and shows BERT's superior ability to separate word senses in embedding space compared to ELMo and Flair NLP.

Findings

01

BERT outperforms other CWEs in WSD tasks.

02

BERT can distinguish different word senses in embedding space.

03

Simple nearest neighbor method achieves state-of-the-art results.

Abstract

Contextualized word embeddings (CWE) such as provided by ELMo (Peters et al., 2018), Flair NLP (Akbik et al., 2018), or BERT (Devlin et al., 2019) are a major recent innovation in NLP. CWEs provide semantic vector representations of words depending on their respective context. Their advantage over static word embeddings has been shown for a number of tasks, such as text classification, sequence tagging, or machine translation. Since vectors of the same word type can vary depending on the respective context, they implicitly provide a model for word sense disambiguation (WSD). We introduce a simple but effective approach to WSD using a nearest neighbor classification on CWEs. We compare the performance of different CWE models for the task and can report improvements above the current state of the art for two standard WSD benchmark datasets. We further show that the pre-trained BERT model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uhh-lt/bert-sense
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsLinear Layer · Sigmoid Activation · Tanh Activation · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam