Weaver: End-to-End Agentic System Training for Video Interleaved Reasoning

Yudi Shi; Shangzhe Di; Qirui Chen; Qinian Wang; Jiayin Cai; Xiaolong Jiang; Yao Hu; Weidi Xie

arXiv:2602.05829·cs.CV·February 6, 2026

Weaver: End-to-End Agentic System Training for Video Interleaved Reasoning

Yudi Shi, Shangzhe Di, Qirui Chen, Qinian Wang, Jiayin Cai, Xiaolong Jiang, Yao Hu, Weidi Xie

PDF

Open Access

TL;DR

Weaver is an end-to-end multimodal reasoning system that dynamically utilizes tools and reinforcement learning to improve performance on complex video reasoning tasks involving long videos.

Contribution

Weaver introduces a novel end-to-end trainable system that dynamically invokes tools and employs reinforcement learning for improved video reasoning.

Findings

01

Enhanced performance on complex video reasoning benchmarks

02

Effective integration of tool invocation and reinforcement learning

03

Improved reasoning over long videos

Abstract

Video reasoning constitutes a comprehensive assessment of a model's capabilities, as it demands robust perceptual and interpretive skills, thereby serving as a means to explore the boundaries of model performance. While recent research has leveraged text-centric Chain-of-Thought reasoning to augment these capabilities, such approaches frequently suffer from representational mismatch and restricted by limited perceptual acuity. To address these limitations, we propose Weaver, a novel, end-to-end trainable multimodal reasoning agentic system. Weaver empowers its policy model to dynamically invoke diverse tools throughout the reasoning process, enabling progressive acquisition of crucial visual cues and construction of authentic multimodal reasoning trajectories. Furthermore, we integrate a reinforcement learning algorithm to allow the system to freely explore strategies for employing and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics