TraceAV-Bench: Benchmarking Multi-Hop Trajectory Reasoning over Long Audio-Visual Videos
Hengyi Feng, Hao Liang, Mingrui Chen, Bohan Zeng, Meiyi Qiang, Zhengyang Zhao, Zimo Meng, Zeang Sheng, Wentao Zhang

TL;DR
TraceAV-Bench is a comprehensive benchmark designed to evaluate multi-hop reasoning and hallucination robustness in long audio-visual videos, revealing significant challenges for current models.
Contribution
It introduces the first benchmark for multi-hop reasoning over long audio-visual content, with a large dataset and detailed evaluation dimensions.
Findings
Current models perform poorly on TraceAV-Bench, with the best reaching only 68.29%.
Robustness to multimodal hallucination is largely independent of reasoning performance.
The dataset contains 2,200 questions over 578 long videos, averaging 3.68 reasoning hops.
Abstract
Real-world audio-visual understanding requires chaining evidence that is sparse, temporally dispersed, and split across the visual and auditory streams, whereas existing benchmarks largely fail to evaluate this capability. They restrict videos to short clips, isolate modalities, or reduce questions to one-hop perception. We introduce TraceAV-Bench, the first benchmark to jointly evaluate multi-hop reasoning over long audio-visual trajectories and multimodal hallucination robustness. TraceAV-Bench comprises 2,200 rigorously validated multiple-choice questions over 578 long videos, totaling 339.5 hours, spanning 4 evaluation dimensions and 15 sub-tasks. Each question is grounded in an explicit reasoning chain that averages 3.68 hops across a 15.1-minute temporal span. The dataset is built by a three-step semi-automated pipeline followed by a strict quality assurance process. Evaluation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
