SLM: Learning a Discourse Language Representation with Sentence Unshuffling
Haejun Lee, Drew A. Hudson, Kangwook Lee, Christopher D. Manning

TL;DR
This paper proposes Sentence-level Language Modeling, a self-supervised pre-training method that enhances discourse-level representations by shuffling and reconstructing sentence order, improving downstream NLP task performance.
Contribution
It introduces a novel sentence shuffling pre-training objective to better capture intermediate discourse structures in language models.
Findings
Improved performance on GLUE, SQuAD, and DiscoEval tasks.
Enhanced sentence-level and discourse representations.
Outperforms original BERT significantly.
Abstract
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation in a fully self-supervised manner. Recent pre-training methods in NLP focus on learning either bottom or top-level language representations: contextualized word representations derived from language model objectives at one extreme and a whole sequence representation learned by order classification of two given textual segments at the other. However, these models are not directly encouraged to capture representations of intermediate-size structures that exist in natural languages such as sentences and the relationships among them. To that end, we propose a new approach to encourage learning of a contextualized sentence-level representation by shuffling the sequence of input sentences and training a hierarchical transformer model to reconstruct the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Multi-Head Attention · Weight Decay · Attention Is All You Need · Residual Connection · Attention Dropout · Layer Normalization
