Features Emerge as Discrete States: The First Application of SAEs to 3D Representations

Albert Miao; Chenliang Zhou; Jiawei Zhou; Cengiz Oztireli

arXiv:2512.11263·cs.LG·December 17, 2025

Features Emerge as Discrete States: The First Application of SAEs to 3D Representations

Albert Miao, Chenliang Zhou, Jiawei Zhou, Cengiz Oztireli

PDF

Open Access 3 Reviews

TL;DR

This paper applies Sparse Autoencoders to 3D representations, revealing that the model encodes discrete features and undergoes phase-like transitions, offering insights into feature learning dynamics in 3D neural models.

Contribution

First application of SAEs to 3D data, demonstrating discrete feature encoding and phase transitions in a state-of-the-art 3D reconstruction VAE.

Findings

01

Models encode discrete rather than continuous features.

02

Reconstruction loss exhibits sigmoidal behavior due to phase transitions.

03

Features influence the redistribution of interference, affecting saliency.

Abstract

Sparse Autoencoders (SAEs) are a powerful dictionary learning technique for decomposing neural network activations, translating the hidden state into human ideas with high semantic value despite no external intervention or guidance. However, this technique has rarely been applied outside of the textual domain, limiting theoretical explorations of feature decomposition. We present the first application of SAEs to the 3D domain, analyzing the features used by a state-of-the-art 3D reconstruction VAE applied to 53k 3D models from the Objaverse dataset. We observe that the network encodes discrete rather than continuous features, leading to our key finding: such models approximate a discrete state space, driven by phase-like transitions from feature activations. Through this state transition framework, we address three otherwise unintuitive behaviors - the inclination of the reconstruction…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 1

Strengths

1. New Theoretical Framework: The decomposition of the learning gradient into "presence" ($\alpha_j$) and "identity" ($e_j$) provides a new and powerful lens for why features emerge in a certain way, moving beyond just observing what features are learned. 2. Well-motivated. The application of SAEs to the 3D domain is motivated by the theory analysis to extend interpretability. 3. Strong Explanatory Power: The proposed framework successfully connects several counter-intuitive empirical observatio

Weaknesses

1. The writing can be improved for clarity, such as the definition of ARC and loss difference. It is not clear how to calculate them. What the colored points representing are also not depicted. 2. The findings are based on the Dora-VAE architecture, which processes 3D models by sampling points and using point features and positional encodings. The introduction of SAE is an incremental contribution. 3. The paper uses the terms "discrete state space" and "phase transition" heavily. While this is

Reviewer 02Rating 6Confidence 3

Strengths

- Interpretability for 3D data is underexplored and interesting! - Lots of experiment runs (e.g., 848k feature interventions) which makes the results robust. - The theoretical explanation of how models see features as a decomposition of presence and identity is also interesting. - The bimodal experiments (Fig 5) and visualization (Fig 3) are insightful. - Validating the threshold t with max slope experiments is really nice!

Weaknesses

- The learning dynamics and 3D contributions seem completely disjoint (although both interesting). Moreover, the paper makes broad claims about "a generally applicable, state-based feature framework." However, all the evidence is derived from a single model architecture (Dora-VAE) on a single data modality (3D point clouds). It's impossible to know if these findings (especially the bimodal transitions) are a fundamental property of feature learning, or a specific quirk of the Dora-VAE architectu

Reviewer 03Rating 2Confidence 3

Strengths

Unfortunately, given the current organization and writing of the main content in this paper, it is extremely difficult to identify any valuable insights for readers to learn.

Weaknesses

(W1) The contributions of this paper are vague and poorly discussed. For example, in line 41, the author states that "the scope of data domains has been limited — recent feature interpretability studies have focused on discrete and structured data, like image and text, rather than continuous or unordered data, ..." However, I don't see why the solution to this would be to study SAE applications on 3D data, as the authors mentioned in lines 48–49 and highlighted in the abstract. Why don't the aut

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Model Reduction and Neural Networks