What do Language Models know about word senses? Zero-Shot WSD with Language Models and Domain Inventories
Oscar Sainz, Oier Lopez de Lacalle, Eneko Agirre, German Rigau

TL;DR
This paper investigates how well language models like BERT and RoBERTa can perform zero-shot word sense disambiguation by framing it as a textual entailment task based on domain relations, showing promising results.
Contribution
It introduces a novel zero-shot WSD method using domain-based textual entailment prompts, demonstrating effectiveness comparable to supervised approaches.
Findings
Language models can effectively perform zero-shot WSD.
Framing WSD as a textual entailment task improves disambiguation accuracy.
The approach approaches the performance of supervised systems.
Abstract
Language Models are the core for almost any Natural Language Processing system nowadays. One of their particularities is their contextualized representations, a game changer feature when a disambiguation between word senses is necessary. In this paper we aim to explore to what extent language models are capable of discerning among senses at inference time. We performed this analysis by prompting commonly used Languages Models such as BERT or RoBERTa to perform the task of Word Sense Disambiguation (WSD). We leverage the relation between word senses and domains, and cast WSD as a textual entailment problem, where the different hypothesis refer to the domains of the word senses. Our results show that this approach is indeed effective, close to supervised systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Residual Connection · Weight Decay · Dropout · Dense Connections · Attention Dropout · Linear Layer · Layer Normalization
