Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs
Nazia Tasnim, Keanu Nichols, Yuting Yan, Nicholas Ikechukwu, Elva Zou, Deepti Ghadiyaram, Bryan A. Plummer

TL;DR
The paper introduces DORI, a new benchmark to evaluate object orientation understanding in vision-language models, revealing significant limitations in current systems' ability to perceive and reason about object orientations.
Contribution
DORI is the first comprehensive diagnostic benchmark specifically designed to assess orientation perception in multimodal systems, highlighting their systematic failures.
Findings
Current models achieve only 54.2% accuracy on coarse orientation tasks.
Models perform poorly on tasks requiring reference frame shifts or compound rotations.
Models show systematic inability to estimate angles and track orientation changes.
Abstract
Object orientation understanding represents a fundamental challenge in visual perception critical for applications like robotic manipulation and augmented reality. Current vision-language benchmarks fail to isolate this capability, often conflating it with positional relationships and general scene understanding. We introduce DORI (Discriminative Orientation Reasoning Intelligence), a comprehensive benchmark establishing object orientation perception as a primary evaluation target. DORI assesses four dimensions of orientation comprehension: frontal alignment, rotational transformations, relative directional relationships, and canonical orientation understanding. Through carefully curated tasks from 11 datasets spanning 67 object categories across synthetic and real-world scenarios, DORI provides insights on how multi-modal systems understand object orientations. Our evaluation of 15…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
