Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning
Zhengxuan Wei, Jiajin Tang, Sibei Yang

TL;DR
This paper introduces AMR, a zero-dependency two-stage learning framework for moment retrieval that enhances boundary and semantic discrimination without additional data, significantly improving performance on benchmarks.
Contribution
It proposes a novel zero-external-dependency augmentation method and a two-stage training framework with curriculum learning and distillation for better moment retrieval.
Findings
AMR outperforms previous state-of-the-art methods on multiple benchmarks.
The augmentation effectively resolves boundary ambiguity and semantic confusion.
The two-stage training improves boundary and semantic understanding.
Abstract
Existing Moment Retrieval methods face three critical bottlenecks: (1) data scarcity forces models into shallow keyword-feature associations; (2) boundary ambiguity in transition regions between adjacent events; (3) insufficient discrimination of fine-grained semantics (e.g., distinguishing ``kicking" vs. ``throwing" a ball). In this paper, we propose a zero-external-dependency Augmented Moment Retrieval framework, AMR, designed to overcome local optima caused by insufficient data annotations and the lack of robust boundary and semantic discrimination capabilities. AMR is built upon two key insights: (1) it resolves ambiguous boundary information and semantic confusion in existing annotations without additional data (avoiding costly manual labeling), and (2) it preserves boundary and semantic discriminative capabilities enhanced by training while generalizing to real-world scenarios,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
