A Mathematical Exploration of Why Language Models Help Solve Downstream   Tasks

Nikunj Saunshi; Sadhika Malladi; Sanjeev Arora

arXiv:2010.03648·cs.CL·April 15, 2021·25 cites

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

PDF

Open Access 1 Video

TL;DR

This paper provides a mathematical explanation for why autoregressive language models excel at downstream tasks like text classification, linking next word prediction to classification performance through formal analysis and empirical validation.

Contribution

It formalizes the connection between language modeling and classification, showing that near-optimal language models learn features useful for classification with quantifiable error bounds.

Findings

01

Language modeling benefits downstream classification tasks.

02

Models with low cross-entropy learn linearly solvable features.

03

A new objective function improves classification performance.

Abstract

Autoregressive language models, pretrained using large text corpora to do well on next word prediction, have been successful at solving many downstream tasks, even with zero-shot usage. However, there is little theoretical understanding of this success. This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions: (1) What is the intuitive connection between the pretraining task of next word prediction and text classification? (2) How can we mathematically formalize this connection and quantify the benefit of language modeling? For (1), we hypothesize, and verify empirically, that classification tasks of interest can be reformulated as sentence completion tasks, thus making language modeling a meaningful pretraining task. With a mathematical formalization of this hypothesis, we make progress towards…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining