Using BERT for Word Sense Disambiguation
Jiaju Du, Fanchao Qi, Maosong Sun

TL;DR
This paper leverages BERT to improve Word Sense Disambiguation by creating better sense representations and training a unified classifier, achieving state-of-the-art results on standard benchmarks.
Contribution
It introduces a novel approach combining BERT with sense definitions for a unified WSD classifier capable of disambiguating unseen polysemes.
Findings
Achieved state-of-the-art performance on English All-word WSD
Demonstrated effectiveness of sense definitions in training
Unified classifier handles unseen polysemes
Abstract
Word Sense Disambiguation (WSD), which aims to identify the correct sense of a given polyseme, is a long-standing problem in NLP. In this paper, we propose to use BERT to extract better polyseme representations for WSD and explore several ways of combining BERT and the classifier. We also utilize sense definitions to train a unified classifier for all words, which enables the model to disambiguate unseen polysemes. Experiments show that our model achieves the state-of-the-art results on the standard English All-word WSD evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
