Geo-LLaVA: A Large Multi-Modal Model for Solving Geometry Math Problems   with Meta In-Context Learning

Shihao Xu; Yiyang Luo; Wei Shi

arXiv:2412.10455·cs.CV·December 17, 2024

Geo-LLaVA: A Large Multi-Modal Model for Solving Geometry Math Problems with Meta In-Context Learning

Shihao Xu, Yiyang Luo, Wei Shi

PDF

TL;DR

This paper introduces Geo-LLaVA, a multi-modal model that combines visual reasoning and language understanding to solve complex geometry problems, including solid geometry, with state-of-the-art accuracy.

Contribution

It presents a new multi-modal model and a comprehensive geometry dataset, enabling LLMs to effectively handle visual and spatial reasoning in geometry problems.

Findings

01

Achieved 65.25% accuracy on GeoQA dataset

02

Achieved 42.36% accuracy on GeoMath dataset

03

Enabled the model to generate geometry diagrams and reasoning steps

Abstract

Geometry mathematics problems pose significant challenges for large language models (LLMs) because they involve visual elements and spatial reasoning. Current methods primarily rely on symbolic character awareness to address these problems. Considering geometry problem solving is a relatively nascent field with limited suitable datasets and currently almost no work on solid geometry problem solving, we collect a geometry question-answer dataset by sourcing geometric data from Chinese high school education websites, referred to as GeoMath. It contains solid geometry questions and answers with accurate reasoning steps as compensation for existing plane geometry datasets. Additionally, we propose a Large Multi-modal Model (LMM) framework named Geo-LLaVA, which incorporates retrieval augmentation with supervised fine-tuning (SFT) in the training stage, called meta-training, and employs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.