Insights into a radiology-specialised multimodal large language model with sparse autoencoders

Kenza Bouzid; Shruthi Bannur; Felix Meissen; Daniel Coelho de Castro; Anton Schwaighofer; Javier Alvarez-Valle; Stephanie L. Hyland

arXiv:2507.12950·cs.LG·July 21, 2025

Insights into a radiology-specialised multimodal large language model with sparse autoencoders

Kenza Bouzid, Shruthi Bannur, Felix Meissen, Daniel Coelho de Castro, Anton Schwaighofer, Javier Alvarez-Valle, Stephanie L. Hyland

PDF

Open Access 1 Models

TL;DR

This paper applies sparse autoencoders to interpret a radiology-specialised multimodal large language model, revealing clinically relevant internal features and demonstrating initial control over model outputs, advancing transparency in medical AI.

Contribution

It introduces the use of Matryoshka-SAE for mechanistic interpretability of MAIRA-2, a radiology-focused multimodal large language model, highlighting its internal representations and interpretability challenges.

Findings

01

Identified clinically relevant concepts like medical devices and pathologies within MAIRA-2

02

Demonstrated directional control over model generations through feature steering

03

Revealed practical challenges in interpretability of complex multimodal models

Abstract

Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
microsoft/maira-2-sae
model· ♡ 8
♡ 8

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare