MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval
Seojeong Park, Jiho Choi, Kyungjune Baek, Hyunjung Shim

TL;DR
This paper introduces MomentMix augmentation and a length-aware decoder to improve the localization of short video moments in retrieval tasks, significantly enhancing performance of DETR-based models.
Contribution
It proposes MomentMix augmentation strategies and a length-aware decoder to address short moment localization challenges in video retrieval.
Findings
Outperforms state-of-the-art DETR-based methods on benchmark datasets.
Achieves 9.62% gain in [email protected] on QVHighlights.
Improves mAP by 16.9% on QVHighlights.
Abstract
Video Moment Retrieval (MR) aims to localize moments within a video based on a given natural language query. Given the prevalent use of platforms like YouTube for information retrieval, the demand for MR techniques is significantly growing. Recent DETR-based models have made notable advances in performance but still struggle with accurately localizing short moments. Through data analysis, we identified limited feature diversity in short moments, which motivated the development of MomentMix. MomentMix generates new short-moment samples by employing two augmentation strategies: ForegroundMix and BackgroundMix, each enhancing the ability to understand the query-relevant and irrelevant frames, respectively. Additionally, our analysis of prediction bias revealed that short moments particularly struggle with accurately predicting their center positions and length of moments. To address this,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Multimodal Machine Learning Applications
