TL;DR
This paper introduces a control-theoretic framework, TIF-GRPO, to improve medical vision-language models for 3D CT analysis by reducing hallucinations and enhancing clinical accuracy.
Contribution
It proposes a novel trajectory-integral feedback method that aligns model rewards with clinical correctness, addressing limitations of standard reinforcement learning in medical imaging.
Findings
Enhanced abnormality detection accuracy on 3D CT benchmarks.
Significant reduction in clinical hallucinations and factual errors.
Establishment of a new regulation paradigm for medical vision-language models.
Abstract
Medical vision-language models (VLMs) have rapidly advanced as general-purpose multimodal assistants, yet their deployment in 3D Computed Tomography (CT) analysis remains constrained by a persistent mismatch between optimization objectives and clinical rigor. Current Reinforcement Learning (RL) paradigms still rely on lexical proxy signals that induce ``\textit{Evaluation Hallucinations}'', where models optimize linguistic fluency rather than factual clinical correctness, leading to diagnostically critical errors. To bridge this gap, we introduce the \textbf{Clinical Abnormality Benchmarking Substrate (CABS)}, a structured system that decomposes radiology reports into verifiable clinical semantic units. Using CABS, we identify a ``\textit{Mechanistic Divergence}'' in standard RL, where surface-similarity rewards drive policy gradients to bypass medical facts. We therefore propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
