MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
Zhen Yang, Jinhao Chen, Zhengxiao Du, Wenmeng Yu, Weihan Wang, Wenyi, Hong, Zhihuan Jiang, Bin Xu, Jie Tang

TL;DR
MathGLM-Vision introduces a multi-modal large language model specialized in mathematics, leveraging a diverse fine-tuning dataset to improve reasoning across various visual mathematical problems.
Contribution
The paper presents MathVL, a new diverse dataset, and MathGLM-Vision, a series of fine-tuned models that enhance multi-modal mathematical reasoning capabilities.
Findings
MathGLM-Vision outperforms existing models on benchmarks.
Diverse datasets significantly improve reasoning abilities.
Fine-tuning with MathVL enhances model performance.
Abstract
Large language models (LLMs) have demonstrated significant capabilities in mathematical reasoning, particularly with text-based mathematical problems. However, current multi-modal large language models (MLLMs), especially those specialized in mathematics, tend to focus predominantly on solving geometric problems but ignore the diversity of visual information available in other areas of mathematics. Moreover, the geometric information for these specialized mathematical MLLMs is derived from several public datasets, which are typically limited in diversity and complexity. To address these limitations, we aim to construct a fine-tuning dataset named MathVL, and develop a series of specialized mathematical MLLMs termed MathGLM-Vision by conducting Supervised Fine-Tuning (SFT) on MathVL with various parameter-scale backbones. To extensively evaluate the effectiveness of MathGLM-Vision, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEdcuational Technology Systems · Data Mining and Machine Learning Applications
MethodsFocus
