Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Federico Arangath Joseph, Jerome Sieber, Melanie N. Zeilinger, Carmen, Amo Alonso

TL;DR
This paper introduces lambda-skip connections as an architectural component that guarantees prevention of rank collapse in sequence models, extending the theory from transformers to State Space Models and validating with experiments.
Contribution
It provides the first general theoretical guarantee for preventing rank collapse and extends the analysis to SSMs, highlighting the importance of skip connections.
Findings
Lambda-skip connections prevent rank collapse under certain conditions.
Architectural components like skip connections and gating mechanisms are crucial.
Theoretical analysis applies to both transformers and SSMs.
Abstract
Rank collapse, a phenomenon where embedding vectors in sequence models rapidly converge to a uniform token or equilibrium state, has recently gained attention in the deep learning literature. This phenomenon leads to reduced expressivity and potential training instabilities due to vanishing gradients. Empirical evidence suggests that architectural components like skip connections, LayerNorm, and MultiLayer Perceptrons (MLPs) play critical roles in mitigating rank collapse. While this issue is well-documented for transformers, alternative sequence models, such as State Space Models (SSMs), which have recently gained prominence, have not been thoroughly examined for similar vulnerabilities. This paper extends the theory of rank collapse from transformers to SSMs using a unifying framework that captures both architectures. We study how a parametrized version of the classic skip connection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBIM and Construction Integration · Construction Engineering and Safety · Structural Engineering and Vibration Analysis
MethodsSoftmax · Attention Is All You Need
