MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

Tiago Teixeira; Ana Carolina Erthal; Juan Belieni; Beatriz Canaverde; Diego Mesquita; Miguel Faria; Eliezer de Souza da Silva; Andr\'e F. T. Martins

arXiv:2604.25926·cs.CL·April 30, 2026

MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

Tiago Teixeira, Ana Carolina Erthal, Juan Belieni, Beatriz Canaverde, Diego Mesquita, Miguel Faria, Eliezer de Souza da Silva, Andr\'e F. T. Martins

PDF

TL;DR

This paper introduces Math-PT, a Portuguese mathematical reasoning dataset, and benchmarks current language models, highlighting their strengths and limitations in non-English mathematical tasks.

Contribution

The paper presents Math-PT, a new Portuguese math reasoning dataset, and provides a comprehensive benchmark of LLMs, addressing linguistic bias in existing datasets.

Findings

01

State-of-the-art models perform well on multiple choice questions.

02

Model performance drops on questions with figures or open-ended formats.

03

Math-PT enables evaluation of LLMs in Portuguese mathematical reasoning.

Abstract

The use of large language models (LLMs) for complex mathematical reasoning is an emergent area of research, with fast progress in methods, models, and benchmark datasets. However, most mathematical reasoning evaluations exhibit a significant linguistic bias, with the vast majority of benchmark datasets being exclusively in English or (at best) translated from English. We address this limitation by introducing {\sc Math-PT}, a novel dataset comprising 1,729 mathematical problems written in European and Brazilian Portuguese. {\sc Math-PT} is curated from a variety of high-quality native sources, including mathematical Olympiads, competitions, and exams from Portugal and Brazil. We present a comprehensive benchmark of current state-of-the-art LLMs on {\sc Math-PT}, revealing that frontier reasoning models achieve strong performance in multiple choice questions compared to open weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.