GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
Seunghyuk Cho, Zhenyue Qin, Yang Liu, Youngbin Choi, Seungbeom Lee, Dongwoo Kim

TL;DR
GeoDANO is a novel geometric vision-language model with a domain-agnostic encoder that effectively recognizes geometric features and generalizes across diagram styles, outperforming existing methods in plane geometry problem solving.
Contribution
We introduce GeoDANO, a geometric VLM with a domain-agnostic encoder, and a benchmark for geometric feature recognition, addressing limitations of general-purpose vision encoders.
Findings
GeoCLIP outperforms existing vision encoders in recognizing geometric features.
GeoDANO surpasses specialized methods and GPT-4o on MathVerse.
Our benchmark reveals general-purpose VLMs struggle with geometric feature detection.
Abstract
We introduce GeoDANO, a geometric vision-language model (VLM) with a domain-agnostic vision encoder, for solving plane geometry problems. Although VLMs have been employed for solving geometry problems, their ability to recognize geometric features remains insufficiently analyzed. To address this gap, we propose a benchmark that evaluates the recognition of visual geometric features, including primitives such as dots and lines, and relations such as orthogonality. Our preliminary study shows that vision encoders often used in general-purpose VLMs, e.g., OpenCLIP, fail to detect these features and struggle to generalize across domains. To overcome the limitation, we develop GeoCLIP, a CLIP-based model trained on synthetic geometric diagram--caption pairs. Benchmark results show that GeoCLIP outperforms existing vision encoders in recognizing geometric features. We then propose our VLM,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRetinal Imaging and Analysis · Optical Coherence Tomography Applications
MethodsContrastive Language-Image Pre-training
