Improving Contextual Representation with Gloss Regularized Pre-training

Yu Lin; Zhecheng An; Peihao Wu; Zejun Ma

arXiv:2205.06603·cs.CL·May 16, 2022

Improving Contextual Representation with Gloss Regularized Pre-training

Yu Lin, Zhecheng An, Peihao Wu, Zejun Ma

PDF

Open Access

TL;DR

This paper introduces GR-BERT, an auxiliary gloss regularizer for BERT pre-training that explicitly models word semantic similarity, improving lexical and sentence-level semantic representations and achieving state-of-the-art results in related tasks.

Contribution

The paper proposes a novel gloss regularizer module for BERT pre-training to enhance word semantic similarity modeling, addressing the pre-training and inference discrepancy.

Findings

01

GR-BERT improves lexical substitution performance.

02

Enhanced sentence representations in STS tasks.

03

Achieves new state-of-the-art in lexical substitution.

Abstract

Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) encounter the discrepancy between pre-training and inference. In light of this gap, we investigate the contextual representation of pre-training and inference from the perspective of word probability distribution. We discover that BERT risks neglecting the contextual word similarity in pre-training. To tackle this issue, we propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT), to enhance word semantic similarity. By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled. We design two architectures for GR-BERT and evaluate our model in downstream tasks. Experimental results show that the gloss regularizer benefits BERT in word-level and sentence-level semantic representation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Weight Decay · Multi-Head Attention · Attention Dropout · Dropout · Adam · Layer Normalization