Irish-BLiMP: A Linguistic Benchmark for Evaluating Human and Language Model Performance in a Low-Resource Setting
Josh McGiff, Khanh-Tung Tran, William Mulcahy, D\'aibhidh \'O Luin\'in, Jake Dalzell, R\'ois\'in N\'i Bhroin, Adam Burke, Barry O'Sullivan, Hoang D. Nguyen, Nikola S. Nikolov

TL;DR
Irish-BLiMP is a new benchmark dataset for evaluating linguistic competence in Irish, comparing human and language model performance across various grammatical features, revealing significant gaps especially in models.
Contribution
This work introduces the first systematic framework and dataset for assessing Irish language understanding in both humans and language models, focusing on low-resource language challenges.
Findings
Humans outperform all models in Irish grammatical tasks.
A 18.1% performance gap exists between open- and closed-source LLMs.
Even the best model reaches only 73.5% accuracy compared to human performance.
Abstract
We present Irish-BLiMP (Irish Benchmark of Linguistic Minimal Pairs), the first dataset and framework designed for fine-grained evaluation of linguistic competence in the Irish language, an endangered language. Drawing on a variety of linguistic literature and grammar reference works, we manually constructed and reviewed 1020 minimal pairs across a taxonomy of 11 linguistic features, through a team of fluent Irish speakers. We evaluate both existing Large Language Models (LLMs) and fluent human participants on their syntactic knowledge of Irish. Our findings show that humans outperform all models across all linguistic features, achieving 16.6% higher accuracy on average. Moreover, a substantial performance gap of 18.1% persists between open- and closed-source LLMs, with even the strongest model (gpt-5) reaching only 73.5% accuracy compared to 90.1% by human. Interestingly, human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
