LLM-Bootstrapped Targeted Finding Guidance for Factual MLLM-based Medical Report Generation

Cunyuan Yang; Dejuan Song; Xiaotao Pang; Qianqian Shen; Wenjie Nie; Yifan Huang; Lei Wu; Wei Han; Haishuai Wang; Jiajun Bu

arXiv:2603.00426·cs.CL·March 3, 2026

LLM-Bootstrapped Targeted Finding Guidance for Factual MLLM-based Medical Report Generation

Cunyuan Yang, Dejuan Song, Xiaotao Pang, Qianqian Shen, Wenjie Nie, Yifan Huang, Lei Wu, Wei Han, Haishuai Wang, Jiajun Bu

PDF

Open Access

TL;DR

This paper introduces Fact-Flow, a novel framework that improves factual accuracy in medical report generation by separating visual fact identification from report creation, leveraging LLMs for dataset labeling, and demonstrating superior results on medical datasets.

Contribution

The paper presents Fact-Flow, a new approach that enhances factual correctness in medical reports by decoupling fact detection from report generation and using LLMs for automatic dataset creation.

Findings

01

Significant improvement in factual accuracy over state-of-the-art models.

02

Effective use of LLMs for automatic medical findings dataset creation.

03

Maintains high-quality text generation in medical reports.

Abstract

The automatic generation of medical reports utilizing Multimodal Large Language Models (MLLMs) frequently encounters challenges related to factual instability, which may manifest as the omission of findings or the incorporation of inaccurate information, thereby constraining their applicability in clinical settings. Current methodologies typically produce reports based directly on image features, which inherently lack a definitive factual basis. In response to this limitation, we introduce Fact-Flow, an innovative framework that separates the process of visual fact identification from the generation of reports. This is achieved by initially predicting clinical findings from the image, which subsequently directs the MLLM to produce a report that is factually precise. A pivotal advancement of our approach is a pipeline that leverages a Large Language Model (LLM) to autonomously create a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare