MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs

Alexander R. Fabbri; Diego Mares; Jorge Flores; Meher Mankikar; Ernesto Hernandez; Dean Lee; Bing Liu; Chen Xing

arXiv:2507.17476·cs.CL·July 24, 2025

MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs

Alexander R. Fabbri, Diego Mares, Jorge Flores, Meher Mankikar, Ernesto Hernandez, Dean Lee, Bing Liu, Chen Xing

PDF

Open Access

TL;DR

This paper introduces MultiNRC, a new multilingual reasoning benchmark with native questions in French, Spanish, and Chinese, revealing current LLMs' limited multilingual reasoning capabilities and cultural reasoning challenges.

Contribution

The creation of MultiNRC, a culturally grounded multilingual reasoning benchmark, and systematic evaluation of 14 LLMs across native languages and English equivalents.

Findings

01

LLMs score below 50% on MultiNRC

02

Models perform better on English math reasoning (+10%)

03

Distinct strengths and weaknesses in reasoning categories

Abstract

Although recent Large Language Models (LLMs) have shown rapid improvement on reasoning benchmarks in English, the evaluation of such LLMs' multilingual reasoning capability across diverse languages and cultural contexts remains limited. Existing multilingual reasoning benchmarks are typically constructed by translating existing English reasoning benchmarks, biasing these benchmarks towards reasoning problems with context in English language/cultures. In this work, we introduce the Multilingual Native Reasoning Challenge (MultiNRC), a benchmark designed to assess LLMs on more than 1,000 native, linguistic and culturally grounded reasoning questions written by native speakers in French, Spanish, and Chinese. MultiNRC covers four core reasoning categories: language-specific linguistic reasoning, wordplay & riddles, cultural/tradition reasoning, and math reasoning with cultural relevance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies