SLM: Learning a Discourse Language Representation with Sentence   Unshuffling

Haejun Lee; Drew A. Hudson; Kangwook Lee; Christopher D. Manning

arXiv:2010.16249·cs.CL·November 2, 2020

SLM: Learning a Discourse Language Representation with Sentence Unshuffling

Haejun Lee, Drew A. Hudson, Kangwook Lee, Christopher D. Manning

PDF

TL;DR

This paper proposes Sentence-level Language Modeling, a self-supervised pre-training method that enhances discourse-level representations by shuffling and reconstructing sentence order, improving downstream NLP task performance.

Contribution

It introduces a novel sentence shuffling pre-training objective to better capture intermediate discourse structures in language models.

Findings

01

Improved performance on GLUE, SQuAD, and DiscoEval tasks.

02

Enhanced sentence-level and discourse representations.

03

Outperforms original BERT significantly.

Abstract

We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation in a fully self-supervised manner. Recent pre-training methods in NLP focus on learning either bottom or top-level language representations: contextualized word representations derived from language model objectives at one extreme and a whole sequence representation learned by order classification of two given textual segments at the other. However, these models are not directly encouraged to capture representations of intermediate-size structures that exist in natural languages such as sentences and the relationships among them. To that end, we propose a new approach to encourage learning of a contextualized sentence-level representation by shuffling the sequence of input sentences and training a hierarchical transformer model to reconstruct the original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Multi-Head Attention · Weight Decay · Attention Is All You Need · Residual Connection · Attention Dropout · Layer Normalization