Leveraging Natural Supervision for Language Representation Learning and   Generation

Mingda Chen

arXiv:2207.10617·cs.CL·July 22, 2022

Leveraging Natural Supervision for Language Representation Learning and Generation

Mingda Chen

PDF

Open Access 1 Repo

TL;DR

This paper explores leveraging naturally-occurring supervision in textual data to improve language model training, evaluation, and generation through novel training losses, structural data exploitation, and new challenging datasets.

Contribution

It introduces new self-supervised training methods, structural data utilization techniques, and diverse datasets for evaluating language models in complex tasks.

Findings

01

Enhanced performance of pretrained models on NLP tasks.

02

Effective use of Wikipedia structures and paraphrases for knowledge extraction.

03

Creation of challenging datasets for long-form text generation and summarization.

Abstract

Recent breakthroughs in Natural Language Processing (NLP) have been driven by language models trained on a massive amount of plain text. While powerful, deriving supervision from textual resources is still an open question. For example, language model pretraining often neglects the rich, freely-available structures in textual data. In this thesis, we describe three lines of work that seek to improve the training and evaluation of neural models using naturally-occurring supervision. We first investigate self-supervised training losses to help enhance the performance of pretrained language models for various NLP tasks. Specifically, we alter the sentence prediction loss to make it better suited to other pretraining losses and more challenging to solve. We design an intermediate finetuning step that uses self-supervised training to promote models' ability in cross-task generalization.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mingdachen/syntactic-template-generation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification