Learning to Locate Visual Answer in Video Corpus Using Question

Bin Li; Yixuan Weng; Bin Sun; Shutao Li

arXiv:2210.05423·cs.CV·September 27, 2023

Learning to Locate Visual Answer in Video Corpus Using Question

Bin Li, Yixuan Weng, Bin Sun, Shutao Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces VCVAL, a new task for locating visual answers in large untrimmed video collections using natural language questions, and proposes a novel cross-modal contrastive method to improve performance.

Contribution

The paper presents a new task, VCVAL, along with a novel CCGS method and a reconstructed dataset MedVidCQA, advancing the understanding of instructional videos.

Findings

01

The proposed CCGS method outperforms existing methods in retrieval and localization.

02

The MedVidCQA dataset provides a benchmark for VCVAL.

03

Extensive experiments validate the effectiveness of the approach.

Abstract

We introduce a new task, named video corpus visual answer localization (VCVAL), which aims to locate the visual answer in a large collection of untrimmed instructional videos using a natural language question. This task requires a range of skills - the interaction between vision and language, video retrieval, passage comprehension, and visual answer localization. In this paper, we propose a cross-modal contrastive global-span (CCGS) method for the VCVAL, jointly training the video corpus retrieval and visual answer localization subtasks with the global-span matrix. We have reconstructed a dataset named MedVidCQA, on which the VCVAL task is benchmarked. Experimental results show that the proposed method outperforms other competitive methods both in the video corpus retrieval and visual answer localization subtasks. Most importantly, we perform detailed analyses on extensive experiments,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wengsyx/ccgs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning