Induced Natural Language Rationales and Interleaved Markup Tokens Enable   Extrapolation in Large Language Models

Mirelle Bueno; Carlos Gemmell; Jeffrey Dalton; Roberto Lotufo; Rodrigo; Nogueira

arXiv:2208.11445·cs.CL·November 29, 2022

Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models

Mirelle Bueno, Carlos Gemmell, Jeffrey Dalton, Roberto Lotufo, Rodrigo, Nogueira

PDF

Open Access 1 Repo

TL;DR

This paper shows that large language models can effectively extrapolate to longer sequences by generating step-by-step rationales and using interleaved markup tokens, without changing their architecture or training process.

Contribution

Demonstrates that explicit rationales and markup tokens enable sequence extrapolation in large language models without architectural modifications.

Findings

01

Step-by-step rationales improve task communication.

02

Markup tokens help track token positions for longer sequences.

03

Models can generalize better with surface form guidance.

Abstract

The ability to extrapolate, i.e., to make predictions on sequences that are longer than those presented as training examples, is a challenging problem for current deep learning models. Recent work shows that this limitation persists in state-of-the-art Transformer-based models. Most solutions to this problem use specific architectures or training methods that do not generalize to other tasks. We demonstrate that large language models can succeed in extrapolation without modifying their architecture or training procedure. Our experimental results show that generating step-by-step rationales and introducing marker tokens are both required for effective extrapolation. First, we induce a language model to produce step-by-step rationales before outputting the answer to effectively communicate the task to the model. However, as sequences become longer, we find that current models struggle to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mirelleb/induced-rationales-markup-tokens
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications