GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder

Seunghyuk Cho; Zhenyue Qin; Yang Liu; Youngbin Choi; Seungbeom Lee; Dongwoo Kim

arXiv:2502.11360·cs.CV·September 29, 2025

GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder

Seunghyuk Cho, Zhenyue Qin, Yang Liu, Youngbin Choi, Seungbeom Lee, Dongwoo Kim

PDF

Open Access 1 Video

TL;DR

GeoDANO is a novel geometric vision-language model with a domain-agnostic encoder that effectively recognizes geometric features and generalizes across diagram styles, outperforming existing methods in plane geometry problem solving.

Contribution

We introduce GeoDANO, a geometric VLM with a domain-agnostic encoder, and a benchmark for geometric feature recognition, addressing limitations of general-purpose vision encoders.

Findings

01

GeoCLIP outperforms existing vision encoders in recognizing geometric features.

02

GeoDANO surpasses specialized methods and GPT-4o on MathVerse.

03

Our benchmark reveals general-purpose VLMs struggle with geometric feature detection.

Abstract

We introduce GeoDANO, a geometric vision-language model (VLM) with a domain-agnostic vision encoder, for solving plane geometry problems. Although VLMs have been employed for solving geometry problems, their ability to recognize geometric features remains insufficiently analyzed. To address this gap, we propose a benchmark that evaluates the recognition of visual geometric features, including primitives such as dots and lines, and relations such as orthogonality. Our preliminary study shows that vision encoders often used in general-purpose VLMs, e.g., OpenCLIP, fail to detect these features and struggle to generalize across domains. To overcome the limitation, we develop GeoCLIP, a CLIP-based model trained on synthetic geometric diagram--caption pairs. Benchmark results show that GeoCLIP outperforms existing vision encoders in recognizing geometric features. We then propose our VLM,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder· underline

Taxonomy

TopicsRetinal Imaging and Analysis · Optical Coherence Tomography Applications

MethodsContrastive Language-Image Pre-training