Coarse to Fine: Video Retrieval before Moment Localization
Zijian Gao, Huanyu Liu, Jingyu Liu

TL;DR
This paper proposes a hybrid approach combining feature alignment and fusion to improve video corpus moment retrieval, addressing limitations of existing similarity-based methods.
Contribution
It introduces a novel method that integrates feature fusion with alignment, enhancing retrieval accuracy over traditional similarity-only approaches.
Findings
Improved retrieval performance demonstrated on benchmark datasets.
Fusion-enhanced method outperforms cosine similarity alignment alone.
Addresses limitations of late fusion methods in VCMR.
Abstract
The current state-of-the-art methods for video corpus moment retrieval (VCMR) often use similarity-based feature alignment approach for the sake of convenience and speed. However, late fusion methods like cosine similarity alignment are unable to make full use of the information from both query texts and videos. In this paper, we combine feature alignment with feature fusion to promote the performance on VCMR.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
