The Devil is in the Spurious Correlations: Boosting Moment Retrieval with Dynamic Learning
Xinyang Zhou, Fanyue Wei, Lixin Duan, Angela Yao, Wen Li

TL;DR
This paper introduces a dynamic learning approach to improve video moment retrieval by reducing spurious correlations between queries and background frames, achieving state-of-the-art results.
Contribution
It proposes a novel video synthesis and text-dynamics interaction method to mitigate spurious correlations in transformer-based moment retrieval models.
Findings
Significant performance improvement on QVHighlights and Charades-STA benchmarks.
Effective reduction of background over-association in moment prediction.
Demonstrated generalization across different architectures.
Abstract
Given a textual query along with a corresponding video, the objective of moment retrieval aims to localize the moments relevant to the query within the video. While commendable results have been demonstrated by existing transformer-based approaches, predicting the accurate temporal span of the target moment is still a major challenge. This paper reveals that a crucial reason stems from the spurious correlation between the text query and the moment context. Namely, the model makes predictions by overly associating queries with background frames rather than distinguishing target moments. To address this issue, we propose a dynamic learning approach for moment retrieval, where two strategies are designed to mitigate the spurious correlation. First, we introduce a novel video synthesis approach to construct a dynamic context for the queried moment, enabling the model to attend to the target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Video Analysis and Summarization · Music and Audio Processing
