MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large   Language Model

Zhen Yang; Jinhao Chen; Zhengxiao Du; Wenmeng Yu; Weihan Wang; Wenyi; Hong; Zhihuan Jiang; Bin Xu; Jie Tang

arXiv:2409.13729·cs.CL·December 3, 2024

MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model

Zhen Yang, Jinhao Chen, Zhengxiao Du, Wenmeng Yu, Weihan Wang, Wenyi, Hong, Zhihuan Jiang, Bin Xu, Jie Tang

PDF

Open Access

TL;DR

MathGLM-Vision introduces a multi-modal large language model specialized in mathematics, leveraging a diverse fine-tuning dataset to improve reasoning across various visual mathematical problems.

Contribution

The paper presents MathVL, a new diverse dataset, and MathGLM-Vision, a series of fine-tuned models that enhance multi-modal mathematical reasoning capabilities.

Findings

01

MathGLM-Vision outperforms existing models on benchmarks.

02

Diverse datasets significantly improve reasoning abilities.

03

Fine-tuning with MathVL enhances model performance.

Abstract

Large language models (LLMs) have demonstrated significant capabilities in mathematical reasoning, particularly with text-based mathematical problems. However, current multi-modal large language models (MLLMs), especially those specialized in mathematics, tend to focus predominantly on solving geometric problems but ignore the diversity of visual information available in other areas of mathematics. Moreover, the geometric information for these specialized mathematical MLLMs is derived from several public datasets, which are typically limited in diversity and complexity. To address these limitations, we aim to construct a fine-tuning dataset named MathVL, and develop a series of specialized mathematical MLLMs termed MathGLM-Vision by conducting Supervised Fine-Tuning (SFT) on MathVL with various parameter-scale backbones. To extensively evaluate the effectiveness of MathGLM-Vision, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEdcuational Technology Systems · Data Mining and Machine Learning Applications

MethodsFocus