Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict
Chaochen Wu, Guan Luo, Meiyun Zuo, Zhitao Fan

TL;DR
This paper presents a novel multi-agent reinforcement learning framework for video moment retrieval that effectively resolves conflicts between models and can identify out-of-scope queries without extra training.
Contribution
It introduces a multi-agent system with evidential learning for conflict resolution and out-of-scope detection in video moment retrieval, enhancing accuracy and real-world applicability.
Findings
Outperforms state-of-the-art methods on benchmark datasets.
Modeling agent conflict improves retrieval accuracy.
Effective out-of-scope detection without additional training.
Abstract
Video moment retrieval uses a text query to locate a moment from a given untrimmed video reference. Locating corresponding video moments with text queries helps people interact with videos efficiently. Current solutions for this task have not considered conflict within location results from different models, so various models cannot integrate correctly to produce better results. This study introduces a reinforcement learning-based video moment retrieval model that can scan the whole video once to find the moment's boundary while producing its locational evidence. Moreover, we proposed a multi-agent system framework that can use evidential learning to resolve conflicts between agents' localization output. As a side product of observing and dealing with conflicts between agents, we can decide whether a query has no corresponding moment in a video (out-of-scope) without additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
