GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs

Adrian-Marius Dumitran; Alexandra-Mihaela Danila; Angela-Liliana Dumitran

arXiv:2508.14279·cs.CL·September 30, 2025

GRILE: A Benchmark for Grammar Reasoning and Explanation in Romanian LLMs

Adrian-Marius Dumitran, Alexandra-Mihaela Danila, Angela-Liliana Dumitran

PDF

Open Access

TL;DR

GRILE is a new benchmark for evaluating Romanian language models on grammar reasoning and explanations, revealing current limitations and guiding future educational NLP research in low-resource languages.

Contribution

Introduces GRILE, the first open benchmark with 1,151 Romanian exam questions to assess LLMs' answer accuracy and explanation quality, highlighting systematic weaknesses.

Findings

01

Gemini 2.5 Pro achieves 83% accuracy

02

Most open-weight models score below 65%

03

48% of explanations contain factual or pedagogical flaws

Abstract

LLMs (Large language models) have revolutionized NLP (Natural Language Processing), yet their pedagogical value for low-resource languages remains unclear. We present GRILE (Grammar Romanian Inference and Language Explanations) , the first open benchmark of 1,151 multiple-choice questions harvested from Romanian high-stakes exams (National Evaluation, Baccalaureate, university admissions). GRILE enables us to probe two complementary abilities of seven state-of-the-art multilingual and Romanian-specific LLMs: (i) selecting the correct answer, and (ii) producing linguistically accurate explanations. While Gemini 2.5 Pro reaches 83% accuracy, most open-weight models stay below 65%, and 48% of their explanations contain factual or pedagogical flaws according to expert review. A detailed error analysis pinpoints systematic weaknesses in morphology and in applying the latest DOOM3…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Translation Studies and Practices · Legal Language and Interpretation