PolySmart @ TRECVid 2024 Medical Video Question Answering

Jiaxin Wu; Yiyang Jiang; Xiao-Yong Wei; Qing Li

arXiv:2412.15514·cs.CV·December 23, 2024

PolySmart @ TRECVid 2024 Medical Video Question Answering

Jiaxin Wu, Yiyang Jiang, Xiao-Yong Wei, Qing Li

PDF

Open Access

TL;DR

This paper presents a system for medical video question answering that combines text-to-text retrieval, visual answer localization, and instruction captioning using GPT-4, achieving specific evaluation metrics on the TRECVid 2024 challenge.

Contribution

It introduces a novel approach integrating GPT-4 and LLaVA-Next-Video for medical video QA and localization, with a single submission for TRECVid 2024.

Findings

01

Achieved an F-score of 11.92 in QFISC task.

02

Obtained a mean IoU of 9.6527 for answer localization.

03

Demonstrated the effectiveness of GPT-4 in medical video QA.

Abstract

Video Corpus Visual Answer Localization (VCVAL) includes question-related video retrieval and visual answer localization in the videos. Specifically, we use text-to-text retrieval to find relevant videos for a medical question based on the similarity of video transcript and answers generated by GPT4. For the visual answer localization, the start and end timestamps of the answer are predicted by the alignments on both visual content and subtitles with queries. For the Query-Focused Instructional Step Captioning (QFISC) task, the step captions are generated by GPT4. Specifically, we provide the video captions generated by the LLaVA-Next-Video model and the video subtitles with timestamps as context, and ask GPT4 to generate step captions for the given medical query. We only submit one run for evaluation and it obtains a F-score of 11.92 and mean IoU of 9.6527.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Machine Learning in Healthcare