GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

Zhijian Hou; Lei Ji; Difei Gao; Wanjun Zhong; Kun Yan; Chao Li,; Wing-Kwong Chan; Chong-Wah Ngo; Nan Duan; Mike Zheng Shou

arXiv:2306.15255·cs.CV·June 28, 2023·1 cites

GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

Zhijian Hou, Lei Ji, Difei Gao, Wanjun Zhong, Kun Yan, Chao Li,, Wing-Kwong Chan, Chong-Wah Ngo, Nan Duan, Mike Zheng Shou

PDF

Open Access 1 Repo

TL;DR

This paper presents GroundNLQ, a novel multi-modal grounding model for egocentric videos, achieving state-of-the-art results in the Ego4D NLQ Challenge 2023 through a two-stage pre-training and fine-tuning strategy.

Contribution

Introduction of GroundNLQ, a multi-scale multi-modal grounding model with a two-stage training approach for egocentric video-language understanding.

Findings

01

GroundNLQ outperforms all competing methods on the Ego4D NLQ benchmark.

02

The two-stage pre-training and fine-tuning strategy improves grounding accuracy.

03

GroundNLQ effectively handles long videos with multi-scale temporal modeling.

Abstract

In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data. In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multi-scale grounding module for effective video and text fusion and various temporal intervals, especially for long videos. On the blind test set, GroundNLQ achieves 25.67 and 18.18 for R1@IoU=0.3 and R1@IoU=0.5, respectively, and surpasses all other teams by a noticeable margin. Our code will be released at\url{https://github.com/houzhijian/GroundNLQ}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

houzhijian/groundnlq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling