Training Bi-Encoders for Word Sense Disambiguation

Harsh Kohli

arXiv:2105.10146·cs.CL·May 24, 2021

Training Bi-Encoders for Word Sense Disambiguation

Harsh Kohli

PDF

TL;DR

This paper enhances bi-encoder models for Word Sense Disambiguation by optimizing training strategies and lexical information presentation, achieving state-of-the-art results through multi-stage pre-training and fine-tuning.

Contribution

It introduces novel training and lexical presentation methods for bi-encoders, advancing the state of the art in WSD.

Findings

01

Achieved new state-of-the-art WSD performance

02

Demonstrated effectiveness of multi-stage pre-training

03

Improved lexical information integration methods

Abstract

Modern transformer-based neural architectures yield impressive results in nearly every NLP task and Word Sense Disambiguation, the problem of discerning the correct sense of a word in a given context, is no exception. State-of-the-art approaches in WSD today leverage lexical information along with pre-trained embeddings from these models to achieve results comparable to human inter-annotator agreement on standard evaluation benchmarks. In the same vein, we experiment with several strategies to optimize bi-encoders for this specific task and propose alternative methods of presenting lexical information to our model. Through our multi-stage pre-training and fine-tuning pipeline we further the state of the art in Word Sense Disambiguation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.