Improving Contextual Representation with Gloss Regularized Pre-training
Yu Lin, Zhecheng An, Peihao Wu, Zejun Ma

TL;DR
This paper introduces GR-BERT, an auxiliary gloss regularizer for BERT pre-training that explicitly models word semantic similarity, improving lexical and sentence-level semantic representations and achieving state-of-the-art results in related tasks.
Contribution
The paper proposes a novel gloss regularizer module for BERT pre-training to enhance word semantic similarity modeling, addressing the pre-training and inference discrepancy.
Findings
GR-BERT improves lexical substitution performance.
Enhanced sentence representations in STS tasks.
Achieves new state-of-the-art in lexical substitution.
Abstract
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) encounter the discrepancy between pre-training and inference. In light of this gap, we investigate the contextual representation of pre-training and inference from the perspective of word probability distribution. We discover that BERT risks neglecting the contextual word similarity in pre-training. To tackle this issue, we propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT), to enhance word semantic similarity. By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled. We design two architectures for GR-BERT and evaluate our model in downstream tasks. Experimental results show that the gloss regularizer benefits BERT in word-level and sentence-level semantic representation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Weight Decay · Multi-Head Attention · Attention Dropout · Dropout · Adam · Layer Normalization
