The Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad, Jin Hwa Lee, Wes Gurnee, Max Tegmark

TL;DR
This paper reveals that Large Language Models are highly robust to layer modifications during inference, with performance largely preserved, and proposes a four-stage inference framework based on layer sensitivity patterns.
Contribution
It introduces a novel four-stage inference model for LLMs, supported by empirical evidence of layer-specific robustness and sensitivity patterns across diverse models.
Findings
Models retain 72-95% accuracy after layer interventions
Early and final layers are most sensitive to modifications
Middle layers show remarkable robustness to interventions
Abstract
We investigate the robustness of Large Language Models (LLMs) to structural interventions by deleting and swapping adjacent layers during inference. Surprisingly, models retain 72-95% of their original top-1 prediction accuracy without any fine-tuning. We find that performance degradation is not uniform across layers: interventions to the early and final layers cause the most degradation, while the model is remarkably robust to dropping middle layers. This pattern of localized sensitivity motivates our hypothesis of four stages of inference, observed across diverse model families and sizes: (1) detokenization, where local context is integrated to lift raw token embeddings into higher-level representations; (2) feature engineering, where task- and entity-specific features are iteratively refined; (3) prediction ensembling, where hidden states are aggregated into plausible next-token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
MethodsALIGN
