Do Multimodal Language Models Really Understand Direction? A Benchmark for Compass Direction Reasoning
Hang Yin, Zhifeng Lin, Xin Liu, Bin Sun, Kan Li

TL;DR
This paper introduces the Compass Direction Reasoning benchmark to evaluate multimodal language models' understanding of spatial and compass directions, revealing current limitations and proposing methods to improve their reasoning abilities.
Contribution
The paper presents a new benchmark for compass direction reasoning and demonstrates that data augmentation and reasoning techniques significantly enhance model performance.
Findings
Most MLMs perform at chance level on direction tasks.
Training with CDR data alone yields limited improvements.
Mixdata and CoT fine-tuning significantly boost reasoning accuracy.
Abstract
Direction reasoning is essential for intelligent systems to understand the real world. While existing work focuses primarily on spatial reasoning, compass direction reasoning remains underexplored. To address this, we propose the Compass Direction Reasoning (CDR) benchmark, designed to evaluate the direction reasoning capabilities of multimodal language models (MLMs). CDR includes three types images to test spatial (up, down, left, right) and compass (north, south, east, west) directions. Our evaluation reveals that most MLMs struggle with direction reasoning, often performing at random guessing levels. Experiments show that training directly with CDR data yields limited improvements, as it requires an understanding of real-world physical rules. We explore the impact of mixdata and CoT fine-tuning methods, which significantly enhance MLM performance in compass direction reasoning by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Language, Metaphor, and Cognition
