Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict

Chaochen Wu; Guan Luo; Meiyun Zuo; Zhitao Fan

arXiv:2511.00370·cs.CV·November 4, 2025

Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict

Chaochen Wu, Guan Luo, Meiyun Zuo, Zhitao Fan

PDF

Open Access

TL;DR

This paper presents a novel multi-agent reinforcement learning framework for video moment retrieval that effectively resolves conflicts between models and can identify out-of-scope queries without extra training.

Contribution

It introduces a multi-agent system with evidential learning for conflict resolution and out-of-scope detection in video moment retrieval, enhancing accuracy and real-world applicability.

Findings

01

Outperforms state-of-the-art methods on benchmark datasets.

02

Modeling agent conflict improves retrieval accuracy.

03

Effective out-of-scope detection without additional training.

Abstract

Video moment retrieval uses a text query to locate a moment from a given untrimmed video reference. Locating corresponding video moments with text queries helps people interact with videos efficiently. Current solutions for this task have not considered conflict within location results from different models, so various models cannot integrate correctly to produce better results. This study introduces a reinforcement learning-based video moment retrieval model that can scan the whole video once to find the moment's boundary while producing its locational evidence. Moreover, we proposed a multi-agent system framework that can use evidential learning to resolve conflicts between agents' localization output. As a side product of observing and dealing with conflicts between agents, we can decide whether a query has no corresponding moment in a video (out-of-scope) without additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization