MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese
Tiago Teixeira, Ana Carolina Erthal, Juan Belieni, Beatriz Canaverde, Diego Mesquita, Miguel Faria, Eliezer de Souza da Silva, Andr\'e F. T. Martins

TL;DR
This paper introduces Math-PT, a Portuguese mathematical reasoning dataset, and benchmarks current language models, highlighting their strengths and limitations in non-English mathematical tasks.
Contribution
The paper presents Math-PT, a new Portuguese math reasoning dataset, and provides a comprehensive benchmark of LLMs, addressing linguistic bias in existing datasets.
Findings
State-of-the-art models perform well on multiple choice questions.
Model performance drops on questions with figures or open-ended formats.
Math-PT enables evaluation of LLMs in Portuguese mathematical reasoning.
Abstract
The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in methods, models, and benchmark datasets. However, most mathematical reasoning evaluations exhibit a significant linguistic bias, with the vast majority of benchmark datasets being exclusively in English or (at best) translated from English. We address this limitation by introducing {\sc Math-PT}, a novel dataset comprising 1,729 mathematical problems written in European and Brazilian Portuguese. {\sc Math-PT} is curated from a variety of high-quality native sources, including mathematical Olympiads, competitions, and exams from Portugal and Brazil. We present a comprehensive benchmark of current state-of-the-art LLMs on {\sc Math-PT}, revealing that frontier reasoning models achieve strong performance in multiple choice questions compared to open weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
