Explaining the Impact of Training on Vision Models via Activation Clustering
Ahc\`ene Boubekki, Samuel G. Fadel, Sebastian Mair

TL;DR
This paper presents NAVE, a clustering-based method to interpret vision model internal representations, revealing how training strategies and artifacts influence model semantics and performance.
Contribution
The paper introduces NAVE, a novel visualization technique for understanding vision model encodings without fine-tuning, and demonstrates its effectiveness in analyzing training impacts and artifacts.
Findings
NAVE aligns model concepts with image semantics.
Training strategies significantly affect encoder representations.
Weak training and spurious correlations degrade model performance.
Abstract
This paper introduces Neuro-Activated Vision Explanations (NAVE), a method for extracting and visualizing the internal representations of vision model encoders. By clustering feature activations, NAVE provides insights into learned semantics without fine-tuning. Using object localization, we show that NAVE's concepts align with image semantics. Through extensive experiments, we analyze the impact of training strategies and architectures on encoder representation capabilities. Additionally, we apply NAVE to study training artifacts in vision transformers and reveal how weak training strategies and spurious correlations degrade model performance. Our findings establish NAVE as a valuable tool for post-hoc model inspection and improving transparency in vision models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies · Neural Networks and Applications
