Sentence Encoders on STILTs: Supplementary Training on Intermediate   Labeled-data Tasks

Jason Phang; Thibault F\'evry; Samuel R. Bowman

arXiv:1811.01088·cs.CL·March 1, 2019·258 cites

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

Jason Phang, Thibault F\'evry, Samuel R. Bowman

PDF

Open Access 1 Repo

TL;DR

Supplementary training on intermediate supervised tasks significantly enhances sentence encoder performance on language understanding benchmarks, especially in data-limited scenarios, by building on pretraining methods like BERT and ELMo.

Contribution

This paper demonstrates that supplementary training on intermediate labeled-data tasks improves the performance of sentence encoders beyond standard pretraining, achieving state-of-the-art results on GLUE.

Findings

01

Achieved a GLUE score of 81.8 with BERT, surpassing previous state-of-the-art.

02

Reduced variance across random restarts with supplementary training.

03

Significant improvements in low-data regimes with supplementary training.

Abstract

Pretraining sentence encoders with language modeling and related unsupervised tasks has recently been shown to be very effective for language understanding tasks. By supplementing language model-style pretraining with further training on data-rich supervised tasks, such as natural language inference, we obtain additional performance improvements on the GLUE benchmark. Applying supplementary training on BERT (Devlin et al., 2018), we attain a GLUE score of 81.8---the state of the art (as of 02/24/2019) and a 1.4 point improvement over BERT. We also observe reduced variance across random restarts in this setting. Our approach yields similar improvements when applied to ELMo (Peters et al., 2018a) and Radford et al. (2018)'s model. In addition, the benefits of supplementary training are particularly pronounced in data-constrained regimes, as we show in experiments with artificially limited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zphang/bert_on_stilts
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Sigmoid Activation · Tanh Activation · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam