IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation
Khanh-Tung Tran, Barry O'Sullivan, Hoang D. Nguyen

TL;DR
IRLBench is a culturally grounded, bilingual Irish-English benchmark for evaluating large language models' reasoning and language fidelity in low-resource, multilingual settings, revealing significant performance gaps especially in Irish.
Contribution
Introduces IRLBench, a novel Irish-English benchmark based on Irish exam data, enabling detailed evaluation of LLMs in low-resource, culturally specific contexts.
Findings
Models perform less than 80% valid Irish responses
Best models answer correctly 55.8% in Irish vs. 76.2% in English
Benchmark supports comprehensive, culturally aware multilingual evaluation
Abstract
Recent advances in Large Language Models (LLMs) have demonstrated promising knowledge and reasoning abilities, yet their performance in multilingual and low-resource settings remains underexplored. Existing benchmarks often exhibit cultural bias, restrict evaluation to text-only, rely on multiple-choice formats, and, more importantly, are limited for extremely low-resource languages. To address these gaps, we introduce IRLBench, presented in parallel English and Irish, which is considered definitely endangered by UNESCO. Our benchmark consists of 12 representative subjects developed from the 2024 Irish Leaving Certificate exams, enabling fine-grained analysis of model capabilities across domains. By framing the task as long-form generation and leveraging the official marking scheme, it does not only support a comprehensive evaluation of correctness but also language fidelity. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · linguistics and terminology studies · Translation Studies and Practices
MethodsAttentive Walk-Aggregating Graph Neural Network
