PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Shaoxuan Li; Zhixuan Zhao; Hanze Deng; Zirun Ma; Shulin Tian; Zuyan Liu; Yushi Hu; Haoning Wu; Yuhao Dong; Benlin Liu; Ziwei Liu; and Ranjay Krishna

arXiv:2603.26653·cs.CV·March 30, 2026

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Shaoxuan Li, Zhixuan Zhao, Hanze Deng, Zirun Ma, Shulin Tian, Zuyan Liu, Yushi Hu, Haoning Wu, Yuhao Dong, Benlin Liu, Ziwei Liu, and Ranjay Krishna

PDF

1 Repo 1 Datasets

TL;DR

PerceptionComp is a new manually annotated video benchmark designed to evaluate complex perception-centric reasoning involving multiple visual and logical subtasks across diverse domains.

Contribution

It introduces a challenging benchmark with 1,114 questions on 279 videos, highlighting the difficulty of perception-centric long-horizon reasoning for both humans and AI models.

Findings

01

Humans take longer and perform worse when rewatching is disallowed.

02

State-of-the-art models achieve less than 46% accuracy, indicating a significant gap.

03

Perception-centric reasoning remains a major bottleneck for current AI systems.

Abstract

We introduce PerceptionComp, a manually annotated benchmark for complex, long-horizon, perception-centric video reasoning. PerceptionComp is designed so that no single moment is sufficient: answering each question requires multiple temporally separated pieces of visual evidence and compositional constraints under conjunctive and sequential logic, spanning perceptual subtasks such as objects, attributes, relations, locations, actions, and events, and requiring skills including semantic recognition, visual correspondence, temporal reasoning, and spatial reasoning. The benchmark contains 1,114 highly complex questions on 279 videos from diverse domains including city walk tours, indoor villa tours, video games, and extreme outdoor sports, with 100% manual annotation. Human studies show that PerceptionComp requires substantial test-time thinking and repeated perception steps: participants…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hrinnnn/PerceptionComp
github

Datasets

hrinnnn/PerceptionComp
dataset· 640 dl
640 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.