Sparse Autoencoders for Interpretable Medical Image Representation Learning
Philipp Wesp, Robbie Holland, Vasiliki Sideri-Lampretsa, Sergios Gatidis

TL;DR
This study explores Sparse Autoencoders as a means to create human-interpretable, sparse features from medical image embeddings, enabling high-fidelity reconstruction, semantic understanding, and language-based interpretability in medical imaging.
Contribution
It demonstrates that Sparse Autoencoders can produce sparse, interpretable features from medical image embeddings with minimal information loss and semantic fidelity, advancing interpretability in medical vision models.
Findings
High-fidelity reconstruction of embeddings (R2 up to 0.941)
Recovery of up to 87.8% of downstream performance with only 10 features
Semantic fidelity preserved in image retrieval tasks
Abstract
Vision foundation models (FMs) achieve state-of-the-art performance in medical imaging. However, they encode information in abstract latent representations that clinicians cannot interrogate or verify. The goal of this study is to investigate Sparse Autoencoders (SAEs) for replacing opaque FM image representations with human-interpretable, sparse features. We train SAEs on embeddings from BiomedParse (biomedical) and DINOv3 (general-purpose) using 909,873 CT and MRI 2D image slices from the TotalSegmentator dataset. We find that learned sparse features: (a) reconstruct original embeddings with high fidelity (R2 up to 0.941) and recover up to 87.8% of downstream performance using only 10 features (99.4% dimensionality reduction), (b) preserve semantic fidelity in image retrieval tasks, (c) correspond to specific concepts that can be expressed in language using large language model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
