Features Emerge as Discrete States: The First Application of SAEs to 3D Representations
Albert Miao, Chenliang Zhou, Jiawei Zhou, Cengiz Oztireli

TL;DR
This paper applies Sparse Autoencoders to 3D representations, revealing that the model encodes discrete features and undergoes phase-like transitions, offering insights into feature learning dynamics in 3D neural models.
Contribution
First application of SAEs to 3D data, demonstrating discrete feature encoding and phase transitions in a state-of-the-art 3D reconstruction VAE.
Findings
Models encode discrete rather than continuous features.
Reconstruction loss exhibits sigmoidal behavior due to phase transitions.
Features influence the redistribution of interference, affecting saliency.
Abstract
Sparse Autoencoders (SAEs) are a powerful dictionary learning technique for decomposing neural network activations, translating the hidden state into human ideas with high semantic value despite no external intervention or guidance. However, this technique has rarely been applied outside of the textual domain, limiting theoretical explorations of feature decomposition. We present the first application of SAEs to the 3D domain, analyzing the features used by a state-of-the-art 3D reconstruction VAE applied to 53k 3D models from the Objaverse dataset. We observe that the network encodes discrete rather than continuous features, leading to our key finding: such models approximate a discrete state space, driven by phase-like transitions from feature activations. Through this state transition framework, we address three otherwise unintuitive behaviors - the inclination of the reconstruction…
Peer Reviews
Decision·ICLR 2026 Poster
1. New Theoretical Framework: The decomposition of the learning gradient into "presence" ($\alpha_j$) and "identity" ($e_j$) provides a new and powerful lens for why features emerge in a certain way, moving beyond just observing what features are learned. 2. Well-motivated. The application of SAEs to the 3D domain is motivated by the theory analysis to extend interpretability. 3. Strong Explanatory Power: The proposed framework successfully connects several counter-intuitive empirical observatio
1. The writing can be improved for clarity, such as the definition of ARC and loss difference. It is not clear how to calculate them. What the colored points representing are also not depicted. 2. The findings are based on the Dora-VAE architecture, which processes 3D models by sampling points and using point features and positional encodings. The introduction of SAE is an incremental contribution. 3. The paper uses the terms "discrete state space" and "phase transition" heavily. While this is
- Interpretability for 3D data is underexplored and interesting! - Lots of experiment runs (e.g., 848k feature interventions) which makes the results robust. - The theoretical explanation of how models see features as a decomposition of presence and identity is also interesting. - The bimodal experiments (Fig 5) and visualization (Fig 3) are insightful. - Validating the threshold t with max slope experiments is really nice!
- The learning dynamics and 3D contributions seem completely disjoint (although both interesting). Moreover, the paper makes broad claims about "a generally applicable, state-based feature framework." However, all the evidence is derived from a single model architecture (Dora-VAE) on a single data modality (3D point clouds). It's impossible to know if these findings (especially the bimodal transitions) are a fundamental property of feature learning, or a specific quirk of the Dora-VAE architectu
Unfortunately, given the current organization and writing of the main content in this paper, it is extremely difficult to identify any valuable insights for readers to learn.
(W1) The contributions of this paper are vague and poorly discussed. For example, in line 41, the author states that "the scope of data domains has been limited — recent feature interpretability studies have focused on discrete and structured data, like image and text, rather than continuous or unordered data, ..." However, I don't see why the solution to this would be to study SAE applications on 3D data, as the authors mentioned in lines 48–49 and highlighted in the abstract. Why don't the aut
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Model Reduction and Neural Networks
