Towards Answering Health-related Questions from Medical Videos: Datasets   and Approaches

Deepak Gupta; Kush Attal; and Dina Demner-Fushman

arXiv:2309.12224·cs.CL·September 22, 2023

Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches

Deepak Gupta, Kush Attal, and Dina Demner-Fushman

PDF

Open Access

TL;DR

This paper introduces large-scale medical video datasets and approaches for automatically providing visual answers to health-related questions, aiming to improve access to medical knowledge through instructional videos.

Contribution

It presents a novel pipeline for creating large-scale medical video datasets and develops monomodal and multimodal methods for visual answer retrieval from videos.

Findings

01

Datasets improve model training for medical visual answer localization

02

Visual features significantly enhance approach performance

03

Pre-trained language-vision models offer promising future improvements

Abstract

The increase in the availability of online videos has transformed the way we access information and knowledge. A growing number of individuals now prefer instructional videos as they offer a series of step-by-step procedures to accomplish particular tasks. The instructional videos from the medical domain may provide the best possible visual answers to first aid, medical emergency, and medical education questions. Toward this, this paper is focused on answering health-related questions asked by the public by providing visual answers from medical videos. The scarcity of large-scale datasets in the medical domain is a key challenge that hinders the development of applications that can help the public with their health-related questions. To address this issue, we first proposed a pipelined approach to create two large-scale datasets: HealthVidQA-CRF and HealthVidQA-Prompt. Later, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques