Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
Deepak Gupta, Kush Attal, and Dina Demner-Fushman

TL;DR
This paper introduces large-scale medical video datasets and approaches for automatically providing visual answers to health-related questions, aiming to improve access to medical knowledge through instructional videos.
Contribution
It presents a novel pipeline for creating large-scale medical video datasets and develops monomodal and multimodal methods for visual answer retrieval from videos.
Findings
Datasets improve model training for medical visual answer localization
Visual features significantly enhance approach performance
Pre-trained language-vision models offer promising future improvements
Abstract
The increase in the availability of online videos has transformed the way we access information and knowledge. A growing number of individuals now prefer instructional videos as they offer a series of step-by-step procedures to accomplish particular tasks. The instructional videos from the medical domain may provide the best possible visual answers to first aid, medical emergency, and medical education questions. Toward this, this paper is focused on answering health-related questions asked by the public by providing visual answers from medical videos. The scarcity of large-scale datasets in the medical domain is a key challenge that hinders the development of applications that can help the public with their health-related questions. To address this issue, we first proposed a pipelined approach to create two large-scale datasets: HealthVidQA-CRF and HealthVidQA-Prompt. Later, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
