Technical Report for CVPR 2022 LOVEU AQTC Challenge

Hyeonyu Kim; Jongeun Kim; Jeonghun Kang; Sanguk Park; Dongchan Park; and Taehwan Kim

arXiv:2206.14555·cs.CV·June 30, 2022

Technical Report for CVPR 2022 LOVEU AQTC Challenge

Hyeonyu Kim, Jongeun Kim, Jeonghun Kang, Sanguk Park, Dongchan Park, and Taehwan Kim

PDF

Open Access 1 Repo

TL;DR

This technical report details the development of a top-performing model for the AQTC task in CVPR 2022 LOVEU, introducing a novel attention mechanism to handle multi-modal, multi-step video question answering challenges.

Contribution

The paper proposes a new context ground module attention mechanism and provides comprehensive analysis and ablation studies for multi-modal video question answering.

Findings

01

Achieved 2nd place overall in LOVEU challenge track 3

02

Secured 1st place in two evaluation metrics

03

Demonstrated effectiveness of the proposed attention mechanism

Abstract

This technical report presents the 2nd winning model for AQTC, a task newly introduced in CVPR 2022 LOng-form VidEo Understanding (LOVEU) challenges. This challenge faces difficulties with multi-step answers, multi-modal, and diverse and changing button representations in video. We address this problem by proposing a new context ground module attention mechanism for more effective feature mapping. In addition, we also perform the analysis over the number of buttons and ablation study of different step networks and video features. As a result, we achieved the overall 2nd place in LOVEU competition track 3, specifically the 1st place in two out of four evaluation metrics. Our code is available at https://github.com/jaykim9870/ CVPR-22_LOVEU_unipyler.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jaykim9870/cvpr-22_loveu_unipyler
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning