Exploring Length Generalization in Large Language Models

Cem Anil; Yuhuai Wu; Anders Andreassen; Aitor Lewkowycz; Vedant Misra,; Vinay Ramasesh; Ambrose Slone; Guy Gur-Ari; Ethan Dyer; Behnam Neyshabur

arXiv:2207.04901·cs.CL·November 15, 2022·45 cites

Exploring Length Generalization in Large Language Models

Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra,, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

PDF

Open Access 1 Video

TL;DR

This paper investigates how large language models generalize to longer reasoning problems, finding that in-context learning with scratchpad prompting significantly improves their ability to extrapolate to longer instances.

Contribution

The study demonstrates that combining pretrained models' in-context learning with scratchpad prompting enhances length generalization in transformer-based language models.

Findings

01

Naive finetuning shows poor length generalization regardless of model size.

02

Scratchpad prompting dramatically improves length generalization.

03

Failure analyses reveal common sources of mistakes and opportunities for improvement.

Abstract

The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring the length generalization capabilities of transformer-based language models. We first establish that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale. We then show that combining pretrained large language models' in-context learning abilities with scratchpad prompting (asking the model to output solution steps before producing an answer) results in a dramatic improvement in length generalization. We run careful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploring Length Generalization in Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification