Word Sense Induction with Knowledge Distillation from BERT
Anik Saha, Alex Gittens, Bulent Yener

TL;DR
This paper introduces a two-stage knowledge distillation method from BERT to create multi-sense word embeddings, improving sense disambiguation and performance on similarity and sense induction tasks.
Contribution
It presents a novel approach to distill multiple word senses from BERT into efficient multi-sense embeddings using attention and a skip-gram-like framework.
Findings
Outperforms or matches state-of-the-art multi-sense embeddings on benchmark datasets.
Enhances downstream tasks like topic modeling with improved sense-aware embeddings.
Demonstrates effective training of sense disambiguation using BERT-derived sense distributions.
Abstract
Pre-trained contextual language models are ubiquitously employed for language understanding tasks, but are unsuitable for resource-constrained systems. Noncontextual word embeddings are an efficient alternative in these settings. Such methods typically use one vector to encode multiple different meanings of a word, and incur errors due to polysemy. This paper proposes a two-stage method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context and transferring this sense information to fit multi-sense embeddings in a skip-gram-like framework. We demonstrate an effective approach to training the sense disambiguation mechanism in our model with a distribution over word senses extracted from the output layer embeddings of BERT. Experiments on the contextual word similarity and sense induction tasks show that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Attention Dropout · WordPiece · Dense Connections · Dropout · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia?
