VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering

Yiran Meng; Junhong Ye; Wei Zhou; Guanghui Yue; Xudong Mao; Ruomei Wang; Baoquan Zhao

arXiv:2508.03039·cs.CV·August 6, 2025

VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering

Yiran Meng, Junhong Ye, Wei Zhou, Guanghui Yue, Xudong Mao, Ruomei Wang, Baoquan Zhao

PDF

TL;DR

VideoForest introduces a person-anchored hierarchical reasoning framework for cross-video question answering, leveraging person features and multi-level structures to improve understanding and reasoning across multiple video streams.

Contribution

The paper presents a novel person-anchored hierarchical reasoning framework and a new CrossVideoQA benchmark for effective cross-video understanding without end-to-end training.

Findings

01

Achieved 71.93% accuracy in person recognition

02

Reached 83.75% in behavior analysis

03

Attained 51.67% in summarization and reasoning

Abstract

Cross-video question answering presents significant challenges beyond traditional single-video understanding, particularly in establishing meaningful connections across video streams and managing the complexity of multi-source information retrieval. We introduce VideoForest, a novel framework that addresses these challenges through person-anchored hierarchical reasoning. Our approach leverages person-level features as natural bridge points between videos, enabling effective cross-video understanding without requiring end-to-end training. VideoForest integrates three key innovations: 1) a human-anchored feature extraction mechanism that employs ReID and tracking algorithms to establish robust spatiotemporal relationships across multiple video sources; 2) a multi-granularity spanning tree structure that hierarchically organizes visual content around person-level trajectories; and 3) a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.