Insights into a radiology-specialised multimodal large language model with sparse autoencoders
Kenza Bouzid, Shruthi Bannur, Felix Meissen, Daniel Coelho de Castro, Anton Schwaighofer, Javier Alvarez-Valle, Stephanie L. Hyland

TL;DR
This paper applies sparse autoencoders to interpret a radiology-specialised multimodal large language model, revealing clinically relevant internal features and demonstrating initial control over model outputs, advancing transparency in medical AI.
Contribution
It introduces the use of Matryoshka-SAE for mechanistic interpretability of MAIRA-2, a radiology-focused multimodal large language model, highlighting its internal representations and interpretability challenges.
Findings
Identified clinically relevant concepts like medical devices and pathologies within MAIRA-2
Demonstrated directional control over model generations through feature steering
Revealed practical challenges in interpretability of complex multimodal models
Abstract
Interpretability can improve the safety, transparency and trust of AI models, which is especially important in healthcare applications where decisions often carry significant consequences. Mechanistic interpretability, particularly through the use of sparse autoencoders (SAEs), offers a promising approach for uncovering human-interpretable features within large transformer-based models. In this study, we apply Matryoshka-SAE to the radiology-specialised multimodal large language model, MAIRA-2, to interpret its internal representations. Using large-scale automated interpretability of the SAE features, we identify a range of clinically relevant concepts - including medical devices (e.g., line and tube placements, pacemaker presence), pathologies such as pleural effusion and cardiomegaly, longitudinal changes and textual features. We further examine the influence of these features on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
