SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval
Sunjae Yoon, Gwanhyeong Koo, Dahyun Kim, Chang D. Yoo

TL;DR
SCANet introduces a scene complexity-aware approach for weakly-supervised video moment retrieval, adaptively generating proposals based on scene complexity to improve localization accuracy across diverse videos.
Contribution
The paper proposes SCANet, a novel network that measures scene complexity and adapts proposal generation, addressing limitations of fixed heuristics in existing weakly-supervised VMR systems.
Findings
Achieved state-of-the-art results on Charades-STA, ActivityNet, and TVR benchmarks.
Effectively models scene complexity to improve proposal relevance.
Demonstrates the importance of adaptive proposals in weakly-supervised settings.
Abstract
Video moment retrieval aims to localize moments in video corresponding to a given language query. To avoid the expensive cost of annotating the temporal moments, weakly-supervised VMR (wsVMR) systems have been studied. For such systems, generating a number of proposals as moment candidates and then selecting the most appropriate proposal has been a popular approach. These proposals are assumed to contain many distinguishable scenes in a video as candidates. However, existing proposals of wsVMR systems do not respect the varying numbers of scenes in each video, where the proposals are heuristically determined irrespective of the video. We argue that the retrieval system should be able to counter the complexities caused by varying numbers of scenes in each video. To this end, we present a novel concept of a retrieval system referred to as Scene Complexity Aware Network (SCANet), which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsAttentive Walk-Aggregating Graph Neural Network
