MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs
Alexander R. Fabbri, Diego Mares, Jorge Flores, Meher Mankikar, Ernesto Hernandez, Dean Lee, Bing Liu, Chen Xing

TL;DR
This paper introduces MultiNRC, a new multilingual reasoning benchmark with native questions in French, Spanish, and Chinese, revealing current LLMs' limited multilingual reasoning capabilities and cultural reasoning challenges.
Contribution
The creation of MultiNRC, a culturally grounded multilingual reasoning benchmark, and systematic evaluation of 14 LLMs across native languages and English equivalents.
Findings
LLMs score below 50% on MultiNRC
Models perform better on English math reasoning (+10%)
Distinct strengths and weaknesses in reasoning categories
Abstract
Although recent Large Language Models (LLMs) have shown rapid improvement on reasoning benchmarks in English, the evaluation of such LLMs' multilingual reasoning capability across diverse languages and cultural contexts remains limited. Existing multilingual reasoning benchmarks are typically constructed by translating existing English reasoning benchmarks, biasing these benchmarks towards reasoning problems with context in English language/cultures. In this work, we introduce the Multilingual Native Reasoning Challenge (MultiNRC), a benchmark designed to assess LLMs on more than 1,000 native, linguistic and culturally grounded reasoning questions written by native speakers in French, Spanish, and Chinese. MultiNRC covers four core reasoning categories: language-specific linguistic reasoning, wordplay & riddles, cultural/tradition reasoning, and math reasoning with cultural relevance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies
