Perturbation is All You Need for Extrapolating Language Models

Zetai Cen; Jin Zhu; Xinwei Shen; Chengchun Shi

arXiv:2605.04344·stat.ML·May 7, 2026

Perturbation is All You Need for Extrapolating Language Models

Zetai Cen, Jin Zhu, Xinwei Shen, Chengchun Shi

PDF

TL;DR

This paper proposes a perturbation-based training framework for large language models that enhances their ability to predict sequences outside the training data support, with theoretical and empirical validation.

Contribution

It introduces a novel perturbation approach for language modeling that improves extrapolability and provides a rigorous theoretical foundation for out-of-support predictions.

Findings

01

Improved out-of-support prediction accuracy.

02

Maintains competitive in-support performance.

03

Theoretically grounded in extrapolability analysis.

Abstract

We introduce a simple yet powerful framework for training large language models. In contrast to the standard autoregressive next-token prediction based on an exact prefix, we propose a perturbation-based procedure that first transforms the prefix into a semantic neighbor and then conditions on this perturbed variant for next-token prediction. This yields a hierarchical model with a pre-post-additive noise structure. Within this framework, we develop a rigorous theory of extrapolability, namely, the capacity of a model class to make reliable predictions for token sequences that lie outside the empirical support of the training corpus. We evaluate the finite-sample performance of the proposed procedure using both synthetic and real-world language data. Results show that the proposed method consistently improves out-of-support prediction while maintaining competitive in-support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.