On the Runway Cascade of Transformers for Language Modeling

Hunjae Lee; Corey Clark

arXiv:2601.14522·cs.LG·January 22, 2026

On the Runway Cascade of Transformers for Language Modeling

Hunjae Lee, Corey Clark

PDF

Open Access

TL;DR

This paper introduces runway-aware rewiring in decoder-only transformers to address information propagation issues, leading to improved language modeling, retrieval, and extrapolation without adding extra parameters.

Contribution

It formalizes the runway cascade phenomenon and proposes a parameter-free rewiring method to enhance information flow in causal transformers.

Findings

01

Improved language modeling performance

02

Enhanced information retrieval capabilities

03

Better extrapolation abilities

Abstract

In decoder-only (causal) transformers, the computation graph created by causal masking routes information through both direct-path attention and indirect paths formed by intermediate tokens. We denote these indirect paths between token pairs as their runways. We argue that certain failure modes of causal transformers as observed by a growing body of recent works are likely exacerbated by a misalignment between these two information propagation modes. We formalize runway cascade as a phenomenon whereby this misalignment results in redundancies and irrelevant information cascading to token representations despite adequately learned attention patterns. As a solution, we propose runway-aware rewiring as a more explicit way of incorporating runway context directly into each token's direct-path attention. This mechanism re-wires the attention pattern for each token based on a summary of its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques