Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large   Language Models

Wenhao Shi; Zhiqiang Hu; Yi Bin; Junhua Liu; Yang Yang; See-Kiong Ng,; Lidong Bing; Roy Ka-Wei Lee

arXiv:2406.17294·cs.CL·October 10, 2024

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng,, Lidong Bing, Roy Ka-Wei Lee

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

Math-LLaVA introduces a new multimodal dataset and model that significantly enhance the mathematical reasoning abilities of large language models by leveraging diverse, high-quality visual question-answer pairs.

Contribution

The paper presents MathV360K, a large diverse dataset, and Math-LLaVA, a fine-tuned model that improves multimodal mathematical reasoning performance.

Findings

01

19-point accuracy improvement over previous models

02

Achieves performance comparable to GPT-4V on MathVista minitest

03

Outperforms existing models on Math-V, MathVerse, and MMMU benchmarks

Abstract

Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge this gap, we address the lack of high-quality, diverse multimodal mathematical datasets by collecting 40K high-quality images with question-answer pairs from 24 existing datasets and synthesizing 320K new pairs, creating the MathV360K dataset, which enhances both the breadth and depth of multimodal mathematical questions. We introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K. This novel approach significantly improves the multimodal mathematical reasoning capabilities of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hzq950419/math-llava
pytorchOfficial

Models

🤗
Zhiqiang007/Math-LLaVA
model· 18 dl· ♡ 5
18 dl♡ 5

Datasets

Zhiqiang007/MathV360K
dataset· 160 dl
160 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling