TL;DR
This paper introduces a new hypothesis and framework showing that concept-aligned directions in deep networks originate in the input space and become more prominent with depth, enhancing understanding of AI representations.
Contribution
It proposes the Input-Space Linearity Hypothesis and Spectral Principal Path framework, linking input space structure to deep network representations and demonstrating their robustness across modalities.
Findings
Concept directions originate in input space and amplify with depth.
Deep networks distill linear representations along spectral directions.
Representations are robust across vision-language modalities.
Abstract
High-level representations have become a central focus in enhancing AI transparency and control, shifting attention from individual neurons or circuits to structured semantic directions that align with human-interpretable concepts. Motivated by the Linear Representation Hypothesis (LRH), we propose the Input-Space Linearity Hypothesis (ISLH), which posits that concept-aligned directions originate in the input space and are selectively amplified with increasing depth. We then introduce the Spectral Principal Path (SPP) framework, which formalizes how deep networks progressively distill linear representations along a small set of dominant spectral directions. Building on this framework, we further demonstrate the multimodal robustness of these representations in Vision-Language Models (VLMs). By bridging theoretical insights with empirical validation, this work advances a structured…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper is well-written, quite self-contained, and has well-crafted illustrations. The proposed method is both creative and original. The authors honestly point out that “our current framework is subject to several limitations.” (conclusion). While the developed method rests on a few key assumptions, it is elegant and well thought through. Given that the assumptions hold in practice, which the presented results suggest, the framework is a notable step towards better understanding how linear
1. Introduction: covers the core concepts; however, readers unfamiliar with the LRH paper (Park et al. 2023) have considerably less context for understanding the setting. 1. The theoretical derivation of the SPP method hinges on having a generalized network, like equation 5, without non-linearities. Yet, little discussion is provided on the limitations that come with this assumption. 1. It wasn’t immediately clear to me why the second term in equation 8 is inserted. Based on the model specific
1. Analysis of the LRH and the inquiry for the reason of its emergence is interesting, analysing it through intermediate layers is sound.
1. LRH has mainly been discussed in text-only contexts, and the ISLH is also formulated that way. Yet the experiments use a VLM, and all actual manipulations seem to be on the language side. It’s unclear what the vision modality contributes here, this effectively invalidates the claimed "raw-space" perspective. 2. The exposition is confusing (see Questions). 3. Theoretical claims are made for linear networks. Since a purely linear network can be collapsed into a single matrix, it’s unclear how t
The work bridges theory and interpretability, offering an appealing spectral perspective on how neural networks encode and stabilize concepts. The proposed Spectral Principal Path framework provides a unified spectral mechanism that connects the input-space structure to the linear separability observed in deep representations, while this link is well articulated in the main text. Moreover, the provided evidence of singular vector stabilization and selective singular value growth provide some
My main concerns revolve around some experimental choices of the authors and some restricting assumptions considered. Specifically: *Simplified architecture assumption*: The SPP derivation assumes a purely stacked linear model, which may not capture non-linearities or normalization effects critical in deep networks. How robust is the theory under more realistic settings? Would the insights provided in the manuscript extend to more common architectures? *Residual and attention extensions*: In t
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Focus · ALIGN · Sparse Evolutionary Training
