GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models
Shangyu Xing, Changhao Xiang, Yuteng Han, Yifan Yue, Zhen Wu, Xinyu, Liu, Zhangtai Wu, Fei Zhao, Xinyu Dai

TL;DR
This paper introduces GePBench, a new benchmark to evaluate the geometric perception abilities of multimodal large language models, revealing current deficiencies and demonstrating improvements with targeted training.
Contribution
The paper presents GePBench, the first comprehensive benchmark for geometric perception in MLLMs, and shows how training with this data enhances model performance.
Findings
Current MLLMs perform poorly on geometric perception tasks.
Training with GePBench data significantly improves geometric understanding.
Geometric perception is crucial for advanced multimodal applications.
Abstract
Multimodal large language models (MLLMs) have made significant progress in integrating visual and linguistic understanding. Existing benchmarks typically focus on high-level semantic capabilities, such as scene understanding and visual reasoning, but often overlook a crucial, foundational ability: geometric perception. Geometric perception involves understanding geometric shapes, structures, and spatial relationships, which are essential for supporting higher-level semantic tasks. Despite its importance, this capability remains underexplored in current MLLM research. To address this gap, we introduce GePBench, a novel benchmark designed to assess the geometric perception abilities of MLLMs. Our extensive evaluations reveal that current state-of-the-art MLLMs exhibit significant deficiencies in geometric perception tasks. Furthermore, we show that models trained with GePBench data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
