Ablate and Rescue: A Causal Analysis of Residual Stream Hyper-Connections
William Peng, Josheev Rai, Kevin Tseng, Siwei Wang, Sean Wu

TL;DR
This paper introduces a causal analysis framework for multi-stream transformer architectures, specifically the Manifold-Constrained Hyper-Connections (mHC), revealing how information is distributed and utilized across residual streams.
Contribution
It provides the first mechanistic analysis of mHC architectures using a novel ablate-and-rescue causal intervention method.
Findings
Residual streams show functional redundancy and asymmetric utilization.
Information distribution across streams is more complex than representational similarity suggests.
The ablate-and-rescue framework enables causal comparison of residual streams.
Abstract
Multi-stream transformer architectures have recently been proposed as a promising direction for managing representation collapse and the vanishing gradient problem for residual connections, yet their internal mechanisms remain unexplored. In particular, the recently introduced Manifold-Constrained Hyper-Connections (mHC) architecture posits multiple residual streams with constrained interaction, but lacks in-depth mechanistic analysis. We present the first open-source mHC language model (https://huggingface.co/wgpeng/mhc-780m) and analyze the multiple-stream architecture with a suite of representation-level metrics and causal interventions to probe how parallel streams encode and utilize information. Specifically, we introduce a systematic stream ablation-and-rescue framework that enables direct causal comparison of residual streams during inference. Through targeted pairwise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Generative Adversarial Networks and Image Synthesis
