TL;DR
This paper presents a novel approach to word sense disambiguation by fine-tuning BERT as a relevance ranking model with data augmentation, achieving state-of-the-art results on benchmark datasets.
Contribution
It formulates WSD as a relevance ranking task and introduces a data augmentation method using WordNet examples for improved performance.
Findings
Achieves state-of-the-art results on English all-words WSD datasets
Relevance ranking formulation improves sense disambiguation accuracy
Data augmentation enhances model robustness and generalization
Abstract
Domain adaptation or transfer learning using pre-trained language models such as BERT has proven to be an effective approach for many natural language processing tasks. In this work, we propose to formulate word sense disambiguation as a relevance ranking task, and fine-tune BERT on sequence-pair ranking task to select the most probable sense definition given a context sentence and a list of candidate sense definitions. We also introduce a data augmentation technique for WSD using existing example sentences from WordNet. Using the proposed training objective and data augmentation technique, our models are able to achieve state-of-the-art results on the English all-words benchmark datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Adam · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Dropout · Linear Warmup With Linear Decay · Layer Normalization · Weight Decay · Attention Dropout
