TL;DR
This paper introduces a lightweight Thought Graph Traversal framework that enhances chest X-ray report generation by guiding reasoning through organ-specific findings, improving accuracy without retraining the model.
Contribution
The authors propose a novel test-time reasoning method that incorporates structured medical priors and dynamic inference depth adjustment for better report quality.
Findings
Outperforms baseline prompting approaches on standard benchmarks.
Reveals dataset biases through traceable reasoning paths.
Enables a frozen VLLM to self-correct and produce more accurate reports.
Abstract
Test-time scaling offers a promising way to improve the reasoning performance of vision-language large models (VLLMs) without additional training. In this paper, we explore a simple but effective approach for applying test-time scaling to chest X-ray report generation. Specifically, we introduce a lightweight Thought Graph Traversal (TGT) framework that guides the model to reason through organ-specific findings in a medically coherent order. This framework integrates structured medical priors into the prompt, enabling deeper and more logical analysis with no changes to the underlying model. To further enhance reasoning depth, we apply a reasoning budget forcing strategy that adjusts the model's inference depth at test time by dynamically extending its generation process. This simple yet powerful combination allows a frozen radiology VLLM to self-correct and generate more accurate,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
