Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models
Mirelle Bueno, Carlos Gemmell, Jeffrey Dalton, Roberto Lotufo, Rodrigo, Nogueira

TL;DR
This paper shows that large language models can effectively extrapolate to longer sequences by generating step-by-step rationales and using interleaved markup tokens, without changing their architecture or training process.
Contribution
Demonstrates that explicit rationales and markup tokens enable sequence extrapolation in large language models without architectural modifications.
Findings
Step-by-step rationales improve task communication.
Markup tokens help track token positions for longer sequences.
Models can generalize better with surface form guidance.
Abstract
The ability to extrapolate, i.e., to make predictions on sequences that are longer than those presented as training examples, is a challenging problem for current deep learning models. Recent work shows that this limitation persists in state-of-the-art Transformer-based models. Most solutions to this problem use specific architectures or training methods that do not generalize to other tasks. We demonstrate that large language models can succeed in extrapolation without modifying their architecture or training procedure. Our experimental results show that generating step-by-step rationales and introducing marker tokens are both required for effective extrapolation. First, we induce a language model to produce step-by-step rationales before outputting the answer to effectively communicate the task to the model. However, as sequences become longer, we find that current models struggle to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
