The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6   -- Grounded videoQA

Hailiang Zhang; Dian Chao; Zhihao Guan; Yang Yang

arXiv:2407.01907·cs.CV·July 3, 2024

The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA

Hailiang Zhang, Dian Chao, Zhihao Guan, Yang Yang

PDF

Open Access

TL;DR

This paper presents a grounded video question-answering method for the ICCV 2023 challenge, combining VALOR and TubeDETR models to improve object tracking and visual grounding in videos.

Contribution

It introduces a two-stage approach that integrates VALOR for answering questions and TubeDETR for generating bounding boxes, addressing limitations of previous methods.

Findings

01

Enhanced accuracy in visual grounding and object tracking.

02

Effective handling of questions involving object movement over time.

03

Improved performance on the ICCV 2023 perception test challenge.

Abstract

In this paper, we introduce a grounded video question-answering solution. Our research reveals that the fixed official baseline method for video question answering involves two main steps: visual grounding and object tracking. However, a significant challenge emerges during the initial step, where selected frames may lack clearly identifiable target objects. Furthermore, single images cannot address questions like "Track the container from which the person pours the first time." To tackle this issue, we propose an alternative two-stage approach:(1) First, we leverage the VALOR model to answer questions based on video information.(2) concatenate the answered questions with their respective answers. Finally, we employ TubeDETR to generate bounding boxes for the targets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications