Improving Medical VQA through Trajectory-Aware Process Supervision

Halil Ibrahim Gulluk; Olivier Gevaert

arXiv:2605.04064·cs.LG·May 7, 2026

Improving Medical VQA through Trajectory-Aware Process Supervision

Halil Ibrahim Gulluk, Olivier Gevaert

PDF

1 Repo

TL;DR

This paper enhances medical VQA by generating reasoning trajectories, introducing a trajectory-aware reward, and demonstrating improved accuracy through a novel training framework that emphasizes process supervision.

Contribution

It proposes a new two-stage training framework with trajectory-aware rewards for medical VQA, leveraging reasoning trajectories and process supervision to improve reasoning capabilities.

Findings

01

Trajectory-aware reward improves accuracy from 0.598 to 0.689.

02

Combining DTW-based process reward with exact-match reward enhances BERTScore and ROUGE-L.

03

Generated reasoning datasets and code are publicly available.

Abstract

Reasoning capabilities are crucial for reliable medical visual question answering (VQA); however, existing datasets rarely include reasoning explanations. We address this by generating reasoning trajectories for six medical VQA benchmarks using the COMCTS algorithm with open-source vision-language models, with an LLM serving as the verification judge. Building on these generated datasets, we propose a two-stage training framework: supervised fine-tuning followed by Group Relative Policy Optimization (GRPO) with a novel process-based reward. While standard approaches rely solely on exact-match rewards for final answers, we introduce a trajectory-aware reward that measures the similarity between generated and ground-truth reasoning processes. Specifically, we embed reasoning steps using sentence transformers and compute the Dynamic Time Warping (DTW) distance between the resulting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/MICCAI-R1-MED-VQA-code-B14B
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.