Unleashing Video Language Models for Fine-grained HRCT Report Generation
Yingying Fang, Huichi Zhou, KinHei Lee, Yijia Wang, Zhenxuan Zhang, Jiahao Huang, Guang Yang

TL;DR
This paper introduces AbSteering, a framework that adapts Video Language Models for detailed HRCT report generation by emphasizing abnormality reasoning and discrimination, outperforming existing CT models.
Contribution
The paper proposes AbSteering, a novel abnormality-centric approach that guides VideoLMs for precise HRCT report generation, incorporating a Chain-of-Thought scheme and a preference optimization objective.
Findings
AbSteering improves abnormality detection sensitivity.
It reduces hallucinations compared to existing models.
General-purpose VideoLMs transfer effectively to medical imaging.
Abstract
Generating precise diagnostic reports from High-Resolution Computed Tomography (HRCT) is critical for clinical workflow, yet it remains a formidable challenge due to the high pathological diversity and spatial sparsity within 3D volumes. While Video Language Models (VideoLMs) have demonstrated remarkable spatio-temporal reasoning in general domains, their adaptability to domain-specific, high-volume medical interpretation remains underexplored. In this work, we present AbSteering, an abnormality-centric framework that steers VideoLMs toward precise HRCT report generation. Specifically, AbSteering introduces: (i) an abnormality-centric Chain-of-Thought scheme that enforces abnormality reasoning, and (ii) a Direct Preference Optimization objective that utilizes clinically confusable abnormalities as hard negatives to enhance fine-grained discrimination. Our results demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Topic Modeling
