LongVideoAgent: Multi-Agent Reasoning with Long Videos

Runtao Liu; Ziyi Liu; Jiaqi Tang; Yue Ma; Renjie Pi; Jipeng Zhang; Qifeng Chen

arXiv:2512.20618·cs.AI·December 24, 2025

LongVideoAgent: Multi-Agent Reasoning with Long Videos

Runtao Liu, Ziyi Liu, Jiaqi Tang, Yue Ma, Renjie Pi, Jipeng Zhang, Qifeng Chen

PDF

Open Access 2 Models 2 Datasets

TL;DR

This paper introduces LongVideoAgent, a multi-agent system that enhances reasoning over hour-long videos by localizing relevant segments and extracting visual details, significantly improving performance on new episode-level datasets.

Contribution

The paper presents a novel multi-agent framework with reinforcement learning for long-video question answering, addressing limitations of prior methods that rely on summaries or limited tools.

Findings

01

Outperforms non-agent baselines on LongTVQA and LongTVQA+ datasets.

02

Reinforcement learning improves reasoning and planning capabilities.

03

Provides interpretable reasoning trajectories.

Abstract

Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into lossy summaries or rely on limited toolsets, weakening temporal grounding and missing fine-grained cues. We propose a multi-agent framework in which a master LLM coordinates a grounding agent to localize question-relevant segments and a vision agent to extract targeted textual observations. The master agent plans with a step limit, and is trained with reinforcement learning to encourage concise, correct, and efficient multi-agent cooperation. This design helps the master agent focus on relevant clips via grounding, complements subtitles with visual detail, and yields interpretable trajectories. On our proposed LongTVQA and LongTVQA+ which are episode-level datasets aggregated from TVQA/TVQA+, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Video Analysis and Summarization