On the Brittleness of LLMs: A Journey around Set Membership
Lea Hergert, G\'abor Berend, Mario Szegedy, Gyorgy Turan, M\'ark Jelasity

TL;DR
This paper investigates the brittleness of large language models on simple set membership tasks, revealing inconsistent performance and fragmented understanding, thereby highlighting fundamental reliability issues.
Contribution
It introduces a systematic empirical framework for evaluating LLMs on elementary reasoning tasks, exposing their unpredictable failures across various conditions.
Findings
LLMs show consistent brittleness on set membership queries
Performance varies unpredictably with prompt phrasing and model choice
The study maps diverse failure modes of LLM reasoning abilities
Abstract
Large language models (LLMs) achieve superhuman performance on complex reasoning tasks, yet often fail on much simpler problems, raising concerns about their reliability and interpretability. We investigate this paradox through a focused study with two key design features: simplicity, to expose basic failure modes, and scale, to enable comprehensive controlled experiments. We focus on set membership queries -- among the most fundamental forms of reasoning -- using tasks like ``Is apple an element of the set \{pear, plum, apple, raspberry\}?''. We conduct a systematic empirical evaluation across prompt phrasing, semantic structure, element ordering, and model choice. Our large-scale analysis reveals that LLM performance on this elementary task is consistently brittle, and unpredictable across all dimensions, suggesting that the models' ``understanding'' of the set concept is fragmented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI
