ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Zuhao Yang; Kaichen Zhang; Sudong Wang; Keming Wu; Zhongyu Yang; Bo Li; Xiaojuan Qi; Shijian Lu; Xingxuan Li; Lidong Bing

arXiv:2605.20342·cs.CV·May 22, 2026

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Zuhao Yang, Kaichen Zhang, Sudong Wang, Keming Wu, Zhongyu Yang, Bo Li, Xiaojuan Qi, Shijian Lu, Xingxuan Li, Lidong Bing

PDF

2 Repos 1 Models 2 Datasets

TL;DR

ParaVT introduces a multi-agent RL framework for parallel video tool calling, improving long-video understanding by enabling simultaneous tool use and addressing prior-induced stability issues.

Contribution

It presents the first end-to-end RL framework for parallel tool calls in video understanding and proposes PARA-GRPO to stabilize training amidst tool priors.

Findings

01

ParaVT improves long-video understanding benchmarks by +7.9% on average.

02

PARA-GRPO increases format compliance from 0.13 to 0.64.

03

The framework demonstrates better fault tolerance and context management in multi-turn tool calls.

Abstract

Training large multimodal models (LMMs) via reinforcement learning (RL) to natively invoke video-processing tools (e.g., cropping) has become a promising route to long-video understanding. However, existing native-RL methods dispatch tool calls sequentially (i.e., one per turn): a single wrong crop propagates errors without peer correction, multi-turn tool calls corrupt context, and inference cost scales linearly with the number of turns. We introduce ParaVT, the first multi-agent end-to-end RL-trained framework for Parallel Video Tool calling, dispatching multiple time-window crops in a single turn for cleaner context and better fault tolerance. Yet applying standard RL to ParaVT reveals an obstacle we term the Tool Prior Paradox: the pretrained tool priors that enable tool exploration also destabilize cold-started structural format and expose the skip-tool reward shortcut under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
ParaVT/ParaVT-8B
model· 100 dl· ♡ 4
100 dl♡ 4

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.