Sparse Autoencoders for Interpretable Medical Image Representation Learning

Philipp Wesp; Robbie Holland; Vasiliki Sideri-Lampretsa; Sergios Gatidis

arXiv:2603.23794·cs.CV·March 26, 2026

Sparse Autoencoders for Interpretable Medical Image Representation Learning

Philipp Wesp, Robbie Holland, Vasiliki Sideri-Lampretsa, Sergios Gatidis

PDF

Open Access 1 Models

TL;DR

This study explores Sparse Autoencoders as a means to create human-interpretable, sparse features from medical image embeddings, enabling high-fidelity reconstruction, semantic understanding, and language-based interpretability in medical imaging.

Contribution

It demonstrates that Sparse Autoencoders can produce sparse, interpretable features from medical image embeddings with minimal information loss and semantic fidelity, advancing interpretability in medical vision models.

Findings

01

High-fidelity reconstruction of embeddings (R2 up to 0.941)

02

Recovery of up to 87.8% of downstream performance with only 10 features

03

Semantic fidelity preserved in image retrieval tasks

Abstract

Vision foundation models (FMs) achieve state-of-the-art performance in medical imaging. However, they encode information in abstract latent representations that clinicians cannot interrogate or verify. The goal of this study is to investigate Sparse Autoencoders (SAEs) for replacing opaque FM image representations with human-interpretable, sparse features. We train SAEs on embeddings from BiomedParse (biomedical) and DINOv3 (general-purpose) using 909,873 CT and MRI 2D image slices from the TotalSegmentator dataset. We find that learned sparse features: (a) reconstruct original embeddings with high fidelity (R2 up to 0.941) and recover up to 87.8% of downstream performance using only 10 features (99.4% dimensionality reduction), (b) preserve semantic fidelity in image retrieval tasks, (c) correspond to specific concepts that can be expressed in language using large language model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
pwesp/sail
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)