GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models
Aditya Sharma, Aman Dalmia, Mehran Kazemi, Amal Zouaq, Christopher J., Pal

TL;DR
GeoCoder enhances vision-language models' ability to solve geometry problems by generating and executing modular code with a geometry function library, significantly improving reasoning accuracy over existing methods.
Contribution
Introduces a modular code-finetuning framework and a retrieval-augmented variant, RAG-GeoCoder, to improve geometric reasoning in vision-language models.
Findings
Over 16% average accuracy improvement on GeomVerse dataset.
Effective deterministic calculations via code execution.
Reduced reliance on parametric memory with RAG-GeoCoder.
Abstract
Geometry problem-solving demands advanced reasoning abilities to process multimodal inputs and employ mathematical knowledge effectively. Vision-language models (VLMs) have made significant progress in various multimodal tasks. Yet, they still struggle with geometry problems and are significantly limited by their inability to perform mathematical operations not seen during pre-training, such as calculating the cosine of an arbitrary angle, and by difficulties in correctly applying relevant geometry formulas. To overcome these challenges, we present GeoCoder, which leverages modular code-finetuning to generate and execute code using a predefined geometry function library. By executing the code, we achieve accurate and deterministic calculations, contrasting the stochastic nature of autoregressive token prediction, while the function library minimizes errors in formula usage. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Constraint Satisfaction and Optimization · Semantic Web and Ontologies
MethodsLib
