Coarse to Fine: Video Retrieval before Moment Localization

Zijian Gao; Huanyu Liu; Jingyu Liu

arXiv:2110.07201·cs.CV·October 15, 2021·1 cites

Coarse to Fine: Video Retrieval before Moment Localization

Zijian Gao, Huanyu Liu, Jingyu Liu

PDF

Open Access

TL;DR

This paper proposes a hybrid approach combining feature alignment and fusion to improve video corpus moment retrieval, addressing limitations of existing similarity-based methods.

Contribution

It introduces a novel method that integrates feature fusion with alignment, enhancing retrieval accuracy over traditional similarity-only approaches.

Findings

01

Improved retrieval performance demonstrated on benchmark datasets.

02

Fusion-enhanced method outperforms cosine similarity alignment alone.

03

Addresses limitations of late fusion methods in VCMR.

Abstract

The current state-of-the-art methods for video corpus moment retrieval (VCMR) often use similarity-based feature alignment approach for the sake of convenience and speed. However, late fusion methods like cosine similarity alignment are unable to make full use of the information from both query texts and videos. In this paper, we combine feature alignment with feature fusion to promote the performance on VCMR.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization