Video-Browser: Towards Agentic Open-web Video Browsing

Zhengyang Liang; Yan Shu; Xiangrui Liu; Minghao Qin; Kaixin Liang; Nicu Sebe; Zheng Liu; Lizi Liao

arXiv:2512.23044·cs.CV·January 19, 2026

Video-Browser: Towards Agentic Open-web Video Browsing

Zhengyang Liang, Yan Shu, Xiangrui Liu, Minghao Qin, Kaixin Liang, Nicu Sebe, Zheng Liu, Lizi Liao

PDF

Open Access 1 Datasets

TL;DR

This paper introduces Video-Browser, a new agentic framework for open-web video browsing that balances visual perception and efficiency, significantly improving performance and reducing costs in open-ended video exploration.

Contribution

We formalize the task of Agentic Video Browsing, propose the Video-Browser framework with Pyramidal Perception, and establish a benchmark for open-ended video exploration.

Findings

01

Achieved 37.5% relative improvement over baseline methods.

02

Reduced token consumption by 58.3% compared to direct visual inference.

03

Established a foundation for verifiable open-web video research.

Abstract

The evolution of autonomous agents is redefining information seeking, transitioning from passive retrieval to proactive, open-ended web research. However, a significant modality gap remains in processing the web's most dynamic and information-dense modality: video. In this paper, we first formalize the task of Agentic Video Browsing and introduce Video-BrowseComp, a benchmark evaluating open-ended agentic browsing tasks that enforce a mandatory dependency on videos. We observe that current paradigms struggle to reconcile the scale of open-ended video exploration with the need for fine-grained visual verification. Direct visual inference (e.g., RAG) maximizes perception but incurs prohibitive context costs, while text-centric summarization optimizes efficiency but often misses critical visual details required for accurate grounding. To address this, we propose Video-Browser, a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

chr1ce/Video-Browsecomp
dataset· 36 dl
36 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Artificial Intelligence in Games