MMATH: A Multilingual Benchmark for Mathematical Reasoning

Wenyang Luo; Wayne Xin Zhao; Jing Sha; Shijin Wang; Ji-Rong Wen

arXiv:2505.19126·cs.CL·May 27, 2025

MMATH: A Multilingual Benchmark for Mathematical Reasoning

Wenyang Luo, Wayne Xin Zhao, Jing Sha, Shijin Wang, Ji-Rong Wen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MMATH, a multilingual benchmark for complex mathematical reasoning across diverse languages, revealing performance disparities and proposing strategies to improve multilingual reasoning in large language models.

Contribution

The paper presents MMATH, the first comprehensive multilingual benchmark for complex reasoning, and explores methods to enhance multilingual reasoning capabilities of large models.

Findings

01

Models show significant performance gaps across languages.

02

Prompting and training strategies improve multilingual reasoning.

03

Language consistency issues are identified and addressed.

Abstract

The advent of large reasoning models, such as OpenAI o1 and DeepSeek R1, has significantly advanced complex reasoning tasks. However, their capabilities in multilingual complex reasoning remain underexplored, with existing efforts largely focused on simpler tasks like MGSM. To address this gap, we introduce MMATH, a benchmark for multilingual complex reasoning spanning 374 high-quality math problems across 10 typologically diverse languages. Using MMATH, we observe that even advanced models like DeepSeek R1 exhibit substantial performance disparities across languages and suffer from a critical off-target issue-generating responses in unintended languages. To address this, we explore strategies including prompting and training, demonstrating that reasoning in English and answering in target languages can simultaneously enhance performance and preserve target-language consistency. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rucaibox/mmath
noneOfficial

Videos

MMATH: A Multilingual Benchmark for Mathematical Reasoning· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Mathematics Education and Teaching Techniques