QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval
Chenghua Gao, Min Li, Jianshuo Liu, Junxing Ren, Lin Chen, Haoyu Liu,, Bo Meng, Jitao Fu, Wenwen Su

TL;DR
This paper introduces QD-VMR, a novel video moment retrieval model that enhances query understanding and debiasing to improve accuracy in retrieving relevant video segments, achieving state-of-the-art results.
Contribution
The paper proposes a new query debiasing framework with enhanced contextual understanding for VMR, combining alignment, contrastive learning, and a DETR-based prediction structure.
Findings
Achieves state-of-the-art performance on three benchmark datasets.
Effectively improves cross-modal understanding and query relevance filtering.
Demonstrates the effectiveness of query debiasing and visual enhancement modules.
Abstract
Video Moment Retrieval (VMR) aims to retrieve relevant moments of an untrimmed video corresponding to the query. While cross-modal interaction approaches have shown progress in filtering out query-irrelevant information in videos, they assume the precise alignment between the query semantics and the corresponding video moments, potentially overlooking the misunderstanding of the natural language semantics. To address this challenge, we propose a novel model called \textit{QD-VMR}, a query debiasing model with enhanced contextual understanding. Firstly, we leverage a Global Partial Aligner module via video clip and query features alignment and video-query contrastive learning to enhance the cross-modal understanding capabilities of the model. Subsequently, we employ a Query Debiasing Module to obtain debiased query features efficiently, and a Visual Enhancement module to refine the video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization
MethodsAttention Is All You Need · Linear Layer · Adam · Layer Normalization · Feedforward Network · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Convolution
