Control-oriented Clustering of Visual Latent Representation
Han Qi, Haocheng Yin, Heng Yang

TL;DR
This paper explores the geometry of visual representations in image-based control, revealing a clustering law similar to neural collapse, which can be exploited to enhance policy performance in control tasks.
Contribution
It uncovers a neural collapse-like clustering phenomenon in control-oriented visual representations and demonstrates how pretraining with this principle improves control policy performance.
Findings
Clustering in visual representations aligns with control-relevant classes.
Pretraining with neural collapse regularization improves test performance by 10-35%.
Control-oriented visual features benefit real-world control tasks.
Abstract
We initiate a study of the geometry of the visual representation space -- the information channel from the vision encoder to the action decoder -- in an image-based control pipeline learned from behavior cloning. Inspired by the phenomenon of neural collapse (NC) in image classification (arXiv:2008.08186), we empirically demonstrate the prevalent emergence of a similar law of clustering in the visual representation space. Specifically, in discrete image-based control (e.g., Lunar Lander), the visual representations cluster according to the natural discrete action labels; in continuous image-based control (e.g., Planar Pushing and Block Stacking), the clustering emerges according to "control-oriented" classes that are based on (a) the relative pose between the object and the target in the input or (b) the relative pose of the object induced by expert actions in the output. Each of the…
Peer Reviews
Decision·ICLR 2025 Spotlight
To my knowledge, this is the first paper to study neural collapse in control. I appreciated some of the unconventional writing choices. The introduction had a nice unification of optimal control and behavior cloning and made the relevance of neural collapse to control very apparent. The experimental approach was really unique, creative, and insightful. The continual learning findings were especially interesting from my perspective.
The paper could benefit from a more concise abstract and a more broad introduction that provides stronger motivation for the problem and clearer articulation of the paper's contributions. I like the unconventional approach to the abstract, but it would be more clear to the audience if the toy experiment appeared as a preliminary to the method section. I took issue with the discussion of resnet features throughout the paper. - First in response to L213: pre-trained resents are still useful com
- The study of neural collapse in the visuomotor control setting is novel and well-motivated. - The minimum-time double integrator example is very clear. - Sim and real experiments show that neural collapse regularization objectives improve downstream task performance. - Fine-tuning experiments show that neural collapse happens when transferring across domains.
- It seems unclear if the downstream improvement is from the NC regularization itself, or from access to ground truth state (by proxy of the class labels). - The criteria for determining whether neural collapse has occurred seems unclear in this setting. - Experiments are mostly on the planar pushing task.
1. I like the interpretable way of converting a regression task (in the context of behavior cloning) of action prediction to (x, y, $\theta$), which can be discretized. 2. It is interesting to see that the authors have included experiments on a real robot for the task of planar pushing and especially the boost that pretraining to minimize NC metrics gives compared to the baseline.
3. The introduction can have more motivation of what NC is and why and where is this phenomenon observed rather than elaborating too much on the toy example. For instance, in the **Our goal** paragraph (L112), the authors raise the question "Does a similar law of clustering, in the spirit of NC, happen when cloning image-based control policies?" However, it is not very clear at this point on why this clustering is important to begin with? What happens if my control policies are working well and
Videos
Taxonomy
TopicsVideo Analysis and Summarization · Data Visualization and Analytics
