GeomVerse: A Systematic Evaluation of Large Models for Geometric   Reasoning

Mehran Kazemi; Hamidreza Alvari; Ankit Anand; Jialin Wu; Xi Chen; Radu; Soricut

arXiv:2312.12241·cs.CV·December 20, 2023·2 cites

GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning

Mehran Kazemi, Hamidreza Alvari, Ankit Anand, Jialin Wu, Xi Chen, Radu, Soricut

PDF

Open Access 4 Datasets

TL;DR

This paper systematically evaluates the reasoning abilities of vision-language models on geometry problems, revealing their limitations in complex multi-step reasoning tasks compared to prior benchmarks.

Contribution

It introduces a synthetic geometry dataset with controllable difficulty levels to systematically assess VLM reasoning capabilities across different complexity axes.

Findings

01

VLMs underperform on geometry reasoning tasks compared to expectations

02

Higher-depth problems require complex reasoning chains, exposing model limitations

03

The dataset enables targeted evaluation of reasoning depth in vision-language models

Abstract

Large language models have shown impressive results for multi-hop mathematical reasoning when the input question is only textual. Many mathematical reasoning problems, however, contain both text and image. With the ever-increasing adoption of vision language models (VLMs), understanding their reasoning abilities for such problems is crucial. In this paper, we evaluate the reasoning capabilities of VLMs along various axes through the lens of geometry problems. We procedurally create a synthetic dataset of geometry questions with controllable difficulty levels along multiple axes, thus enabling a systematic evaluation. The empirical results obtained using our benchmark for state-of-the-art VLMs indicate that these models are not as capable in subjects like geometry (and, by generalization, other topics requiring similar reasoning) as suggested by previous benchmarks. This is made…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling