Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Sunjae Yoon, Ji Woo Hong, Eunseop Yoon, Dahyun Kim, Junyeong Kim, Hee, Suk Yoon, and Chang D. Yoo

TL;DR
This paper introduces SQuiDNet, a novel video moment retrieval model that selectively debiases retrieval predictions by leveraging query semantics, improving accuracy and interpretability on multiple benchmarks.
Contribution
The paper proposes a new selective debiasing approach that preserves helpful biases while removing harmful ones, enhancing multi-modal retrieval performance.
Findings
Outperforms existing methods on TVR, ActivityNet, DiDeMo benchmarks.
Improves interpretability of retrieval results.
Effectively balances bias removal and preservation.
Abstract
Video moment retrieval (VMR) aims to localize target moments in untrimmed videos pertinent to a given textual query. Existing retrieval systems tend to rely on retrieval bias as a shortcut and thus, fail to sufficiently learn multi-modal interactions between query and video. This retrieval bias stems from learning frequent co-occurrence patterns between query and moments, which spuriously correlate objects (e.g., a pencil) referred in the query with moments (e.g., scene of writing with a pencil) where the objects frequently appear in the video, such that they converge into biased moment predictions. Although recent debiasing methods have focused on removing this retrieval bias, we argue that these biased predictions sometimes should be preserved because there are many queries where biased predictions are rather helpful. To conjugate this retrieval bias, we propose a Selective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
