See, Rank, and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection
YuEun Lee, Jung Uk Kim

TL;DR
This paper introduces a word-aware clip filtering method for video moment retrieval and highlight detection, leveraging scene understanding and multimodal models to improve relevance filtering and semantic comprehension.
Contribution
It presents a novel fine-grained filtering approach that identifies important words in queries and enhances video clip relevance using a feature enhancement module and ranking-based filtering.
Findings
Significantly outperforms existing methods in MR and HD tasks
Effective integration of scene understanding improves semantic matching
Demonstrates the importance of word-level filtering in video retrieval
Abstract
Video moment retrieval (MR) and highlight detection (HD) with natural language queries aim to localize relevant moments and key highlights in a video clips. However, existing methods overlook the importance of individual words, treating the entire text query and video clips as a black-box, which hinders contextual understanding. In this paper, we propose a novel approach that enables fine-grained clip filtering by identifying and prioritizing important words in the query. Our method integrates image-text scene understanding through Multimodal Large Language Models (MLLMs) and enhances the semantic understanding of video clips. We introduce a feature enhancement module (FEM) to capture important words from the query and a ranking-based filtering module (RFM) to iteratively refine video clips based on their relevance to these important words. Extensive experiments demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
