An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022
Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong, Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan

TL;DR
This paper introduces CONE, an efficient coarse-to-fine video alignment framework for natural language queries in videos, leveraging dynamic window slicing, contrastive learning, and pre-trained multi-modal models to improve retrieval accuracy.
Contribution
The paper presents a novel window-centric alignment framework that enhances efficiency and accuracy in video-language retrieval tasks for ego-centric videos.
Findings
Achieved R1@IoU=0.3 of 15.26 on the Ego4D challenge
Achieved R1@IoU=0.5 of 9.24 on the Ego4D challenge
Demonstrated effective coarse-to-fine alignment using contrastive learning and pre-trained models.
Abstract
This technical report describes the CONE approach for Ego4D Natural Language Queries (NLQ) Challenge in ECCV 2022. We leverage our model CONE, an efficient window-centric COarse-to-fiNE alignment framework. Specifically, CONE dynamically slices the long video into candidate windows via a sliding window approach. Centering at windows, CONE (1) learns the inter-window (coarse-grained) semantic variance through contrastive learning and speeds up inference by pre-filtering the candidate windows relevant to the NL query, and (2) conducts intra-window (fine-grained) candidate moments ranking utilizing the powerful multi-modal alignment ability of the contrastive vision-text pre-trained model EgoVLP. On the blind test set, CONE achieves 15.26 and 9.24 for R1@IoU=0.3 and R1@IoU=0.5, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsTest · Contrastive Learning
