Geo-LLaVA: A Large Multi-Modal Model for Solving Geometry Math Problems with Meta In-Context Learning
Shihao Xu, Yiyang Luo, Wei Shi

TL;DR
This paper introduces Geo-LLaVA, a multi-modal model that combines visual reasoning and language understanding to solve complex geometry problems, including solid geometry, with state-of-the-art accuracy.
Contribution
It presents a new multi-modal model and a comprehensive geometry dataset, enabling LLMs to effectively handle visual and spatial reasoning in geometry problems.
Findings
Achieved 65.25% accuracy on GeoQA dataset
Achieved 42.36% accuracy on GeoMath dataset
Enabled the model to generate geometry diagrams and reasoning steps
Abstract
Geometry mathematics problems pose significant challenges for large language models (LLMs) because they involve visual elements and spatial reasoning. Current methods primarily rely on symbolic character awareness to address these problems. Considering geometry problem solving is a relatively nascent field with limited suitable datasets and currently almost no work on solid geometry problem solving, we collect a geometry question-answer dataset by sourcing geometric data from Chinese high school education websites, referred to as GeoMath. It contains solid geometry questions and answers with accurate reasoning steps as compensation for existing plane geometry datasets. Additionally, we propose a Large Multi-modal Model (LMM) framework named Geo-LLaVA, which incorporates retrieval augmentation with supervised fine-tuning (SFT) in the training stage, called meta-training, and employs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
