Video-ToC: Video Tree-of-Cue Reasoning

Qizhong Tan; Zhuotao Tian; Guangming Lu; Jun Yu; Wenjie Pei

arXiv:2604.20473·cs.CV·April 23, 2026

Video-ToC: Video Tree-of-Cue Reasoning

Qizhong Tan, Zhuotao Tian, Guangming Lu, Jun Yu, Wenjie Pei

PDF

1 Repo

TL;DR

Video-ToC introduces a tree-of-cue reasoning framework for video understanding, combining structured visual cues, adaptive reward mechanisms, and new datasets to improve reasoning capabilities in Video LLMs.

Contribution

The paper presents a novel tree-of-cue reasoning approach with structured visual cues, adaptive rewards, and new datasets for enhanced video understanding in LLMs.

Findings

01

Outperforms baselines on six video understanding benchmarks.

02

Demonstrates improved reasoning and perception in video analysis.

03

Achieves superior results on a video hallucination benchmark.

Abstract

Existing Video Large Language Models (Video LLMs) struggle with complex video understanding, exhibiting limited reasoning capabilities and potential hallucinations. In particular, these methods tend to perform reasoning solely relying on the pretrained inherent reasoning rationales whilst lacking perception-aware adaptation to the input video content. To address this, we propose \textbf{Video-ToC}, a novel video reasoning framework that enhances video understanding through tree-of-cue reasoning. Specifically, our approach introduces three key innovations: (1) A tree-guided visual cue localization mechanism, which endows the model with enhanced fine-grained perceptual capabilities through structured reasoning patterns; (2) A reasoning-demand reward mechanism, which dynamically adjusts the reward value for reinforcement learning (RL) based on the estimation of reasoning demands, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qizhongtan/Video-ToC
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.