Automatic Fine-grained Segmentation-assisted Report Generation

Frederic Jonske; Constantin Seibold; Osman Alperen Koras; Fin Bahnsen; Marie Bauer; Amin Dada; Hamza Kalisch; Anton Schily; Jens Kleesiek

arXiv:2507.16623·cs.CV·July 23, 2025

Automatic Fine-grained Segmentation-assisted Report Generation

Frederic Jonske, Constantin Seibold, Osman Alperen Koras, Fin Bahnsen, Marie Bauer, Amin Dada, Hamza Kalisch, Anton Schily, Jens Kleesiek

PDF

Open Access

TL;DR

This paper introduces ASaRG, an extension of LLaVA that incorporates fine-grained segmentation maps for improved clinical report generation, demonstrating significant performance gains and enhanced grounding capabilities.

Contribution

ASaRG fuses segmentation maps into LLaVA's architecture, achieving better performance and interpretability in medical report generation with minimal additional parameters.

Findings

01

+0.89% CE F1 score with intermediate features

02

+2.77% CE F1 score with segmentation maps

03

Performance gains over existing segmentation-based methods

Abstract

Reliable end-to-end clinical report generation has been a longstanding goal of medical ML research. The end goal for this process is to alleviate radiologists' workloads and provide second opinions to clinicians or patients. Thus, a necessary prerequisite for report generation models is a strong general performance and some type of innate grounding capability, to convince clinicians or patients of the veracity of the generated reports. In this paper, we present ASaRG (\textbf{A}utomatic \textbf{S}egmentation-\textbf{a}ssisted \textbf{R}eport \textbf{G}eneration), an extension of the popular LLaVA architecture that aims to tackle both of these problems. ASaRG proposes to fuse intermediate features and fine-grained segmentation maps created by specialist radiological models into LLaVA's multi-modal projection layer via simple concatenation. With a small number of added parameters, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques