Process Reward Models for Sentence-Level Verification of LVLM Radiology Reports
Alois Thomas, Maya Varma, Jean-Benoit Delbrouck, and Curtis P. Langlotz

TL;DR
This paper introduces a sentence-level Process Reward Model (PRM) that effectively verifies the factual correctness of radiology report sentences generated by LVLMs, improving safety and accuracy in clinical report generation.
Contribution
The paper presents a novel, lightweight PRM that outperforms existing methods, generalizes across different LVLMs, and enhances report quality and clinical metric performance.
Findings
PRM outperforms existing verification techniques with 7.5% MCC improvement.
PRM improves report filtering, increasing F1-CheXbert scores by 4.5%.
PRM enhances clinical metrics by 7.4% in weighted selection.
Abstract
Automating radiology report generation with Large Vision-Language Models (LVLMs) holds great potential, yet these models often produce clinically critical hallucinations, posing serious risks. Existing hallucination detection methods frequently lack the necessary sentence-level granularity or robust generalization across different LVLM generators. We introduce a novel approach: a sentence-level Process Reward Model (PRM) adapted for this vision-language task. Our PRM predicts the factual correctness of each generated sentence, conditioned on clinical context and preceding text. When fine-tuned on MIMIC-CXR with weakly-supervised labels, a lightweight 0.5B-parameter PRM outperforms existing verification techniques, demonstrating, for instance, relative improvements of 7.5% in Matthews Correlation Coefficient and 1.8% in AUROC over strong white-box baselines on outputs from one LVLM.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
