MApLe: Multi-instance Alignment of Diagnostic Reports and Large Medical Images

Felicia Bader; Philipp Seeb\"ock; Anastasia Bartashova; Ulrike Attenberger; Georg Langs

arXiv:2604.13970·cs.CV·April 16, 2026

MApLe: Multi-instance Alignment of Diagnostic Reports and Large Medical Images

Felicia Bader, Philipp Seeb\"ock, Anastasia Bartashova, Ulrike Attenberger, Georg Langs

PDF

1 Repo

TL;DR

MApLe is a novel multi-task vision language alignment method that links detailed medical image regions with diagnostic report sentences, improving interpretability and performance in medical imaging analysis.

Contribution

It introduces a disentangled approach that separately models anatomical regions and diagnostic findings, enhancing alignment accuracy over existing models.

Findings

01

MApLe outperforms baseline models in alignment tasks.

02

The model successfully links image regions with report sentences.

03

Code is publicly available at https://github.com/cirmuw/MApLe.

Abstract

In diagnostic reports, experts encode complex imaging data into clinically actionable information. They describe subtle pathological findings that are meaningful in their anatomical context. Reports follow relatively consistent structures, expressing diagnostic information with few words that are often associated with tiny but consequential image observations. Standard vision language models struggle to identify the associations between these informative text components and small locations in the images. Here, we propose "MApLe", a multi-task, multi-instance vision language alignment approach that overcomes these limitations. It disentangles the concepts of anatomical region and diagnostic finding, and links local image information to sentences in a patch-wise approach. Our method consists of a text embedding trained to capture anatomical and diagnostic concepts in sentences, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cirmuw/MApLe
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.