IOLBENCH: Benchmarking LLMs on Linguistic Reasoning

Satyam Goyal; Soham Dan

arXiv:2501.04249·cs.CL·September 16, 2025

IOLBENCH: Benchmarking LLMs on Linguistic Reasoning

Satyam Goyal, Soham Dan

PDF

Open Access 1 Repo

TL;DR

This paper introduces IOLBENCH, a new benchmark based on linguistics Olympiad problems, to evaluate and analyze the reasoning capabilities of large language models in linguistic tasks, revealing their current limitations.

Contribution

The paper presents IOLBENCH, a novel dataset for testing LLMs on diverse linguistic reasoning tasks, and provides comprehensive benchmarking results highlighting models' strengths and weaknesses.

Findings

01

LLMs struggle with complex linguistic reasoning tasks.

02

Models show limited ability in rule abstraction and compositional generalization.

03

Benchmark reveals significant gaps in current models' linguistic reasoning abilities.

Abstract

Despite the remarkable advancements and widespread applications of deep neural networks, their ability to perform reasoning tasks remains limited, particularly in domains requiring structured, abstract thought. In this paper, we investigate the linguistic reasoning capabilities of state-of-the-art large language models (LLMs) by introducing IOLBENCH, a novel benchmark derived from International Linguistics Olympiad (IOL) problems. This dataset encompasses diverse problems testing syntax, morphology, phonology, and semantics, all carefully designed to be self-contained and independent of external knowledge. These tasks challenge models to engage in metacognitive linguistic reasoning, requiring the deduction of linguistic rules and patterns from minimal examples. Through extensive benchmarking of leading LLMs, we find that even the most advanced models struggle to handle the intricacies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

satgoy152/ling_llm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques