Unleashing Video Language Models for Fine-grained HRCT Report Generation

Yingying Fang; Huichi Zhou; KinHei Lee; Yijia Wang; Zhenxuan Zhang; Jiahao Huang; Guang Yang

arXiv:2603.12469·cs.CV·March 24, 2026

Unleashing Video Language Models for Fine-grained HRCT Report Generation

Yingying Fang, Huichi Zhou, KinHei Lee, Yijia Wang, Zhenxuan Zhang, Jiahao Huang, Guang Yang

PDF

Open Access

TL;DR

This paper introduces AbSteering, a framework that adapts Video Language Models for detailed HRCT report generation by emphasizing abnormality reasoning and discrimination, outperforming existing CT models.

Contribution

The paper proposes AbSteering, a novel abnormality-centric approach that guides VideoLMs for precise HRCT report generation, incorporating a Chain-of-Thought scheme and a preference optimization objective.

Findings

01

AbSteering improves abnormality detection sensitivity.

02

It reduces hallucinations compared to existing models.

03

General-purpose VideoLMs transfer effectively to medical imaging.

Abstract

Generating precise diagnostic reports from High-Resolution Computed Tomography (HRCT) is critical for clinical workflow, yet it remains a formidable challenge due to the high pathological diversity and spatial sparsity within 3D volumes. While Video Language Models (VideoLMs) have demonstrated remarkable spatio-temporal reasoning in general domains, their adaptability to domain-specific, high-volume medical interpretation remains underexplored. In this work, we present AbSteering, an abnormality-centric framework that steers VideoLMs toward precise HRCT report generation. Specifically, AbSteering introduces: (i) an abnormality-centric Chain-of-Thought scheme that enforces abnormality reasoning, and (ii) a Direct Preference Optimization objective that utilizes clinically confusable abnormalities as hard negatives to enhance fine-grained discrimination. Our results demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Topic Modeling