Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations

Maiko Nagao; Kaito Urata; Atsushi Teramoto; Kazuyoshi Imaizumi; Masashi Kondo; Hiroshi Fujita

arXiv:2601.11075·eess.IV·January 19, 2026

Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations

Maiko Nagao, Kaito Urata, Atsushi Teramoto, Kazuyoshi Imaizumi, Masashi Kondo, Hiroshi Fujita

PDF

Open Access

TL;DR

This study developed a VQA-based image-finding generation system for pulmonary nodules on chest CTs, enabling interactive, question-driven diagnostic support using structured annotations from the LIDC-IDRI dataset.

Contribution

It introduces a novel VQA dataset for chest CT images and demonstrates an effective method for generating image findings based on physicians' questions.

Findings

01

High CIDEr score of 3.896 for generated findings

02

High agreement with reference findings based on morphological characteristics

03

Effective as an interactive diagnostic support system

Abstract

Interpretation of imaging findings based on morphological characteristics is important for diagnosing pulmonary nodules on chest computed tomography (CT) images. In this study, we constructed a visual question answering (VQA) dataset from structured data in an open dataset and investigated an image-finding generation method for chest CT images, with the aim of enabling interactive diagnostic support that presents findings based on questions that reflect physicians' interests rather than fixed descriptions. In this study, chest CT images included in the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) datasets were used. Regions of interest surrounding the pulmonary nodules were extracted from these images, and image findings and questions were defined based on morphological characteristics recorded in the database. A dataset comprising pairs of cropped…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling