IOLBENCH: Benchmarking LLMs on Linguistic Reasoning
Satyam Goyal, Soham Dan

TL;DR
This paper introduces IOLBENCH, a new benchmark based on linguistics Olympiad problems, to evaluate and analyze the reasoning capabilities of large language models in linguistic tasks, revealing their current limitations.
Contribution
The paper presents IOLBENCH, a novel dataset for testing LLMs on diverse linguistic reasoning tasks, and provides comprehensive benchmarking results highlighting models' strengths and weaknesses.
Findings
LLMs struggle with complex linguistic reasoning tasks.
Models show limited ability in rule abstraction and compositional generalization.
Benchmark reveals significant gaps in current models' linguistic reasoning abilities.
Abstract
Despite the remarkable advancements and widespread applications of deep neural networks, their ability to perform reasoning tasks remains limited, particularly in domains requiring structured, abstract thought. In this paper, we investigate the linguistic reasoning capabilities of state-of-the-art large language models (LLMs) by introducing IOLBENCH, a novel benchmark derived from International Linguistics Olympiad (IOL) problems. This dataset encompasses diverse problems testing syntax, morphology, phonology, and semantics, all carefully designed to be self-contained and independent of external knowledge. These tasks challenge models to engage in metacognitive linguistic reasoning, requiring the deduction of linguistic rules and patterns from minimal examples. Through extensive benchmarking of leading LLMs, we find that even the most advanced models struggle to handle the intricacies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
