On the Brittleness of LLMs: A Journey around Set Membership

Lea Hergert; G\'abor Berend; Mario Szegedy; Gyorgy Turan; M\'ark Jelasity

arXiv:2511.12728·cs.CL·November 18, 2025

On the Brittleness of LLMs: A Journey around Set Membership

Lea Hergert, G\'abor Berend, Mario Szegedy, Gyorgy Turan, M\'ark Jelasity

PDF

Open Access

TL;DR

This paper investigates the brittleness of large language models on simple set membership tasks, revealing inconsistent performance and fragmented understanding, thereby highlighting fundamental reliability issues.

Contribution

It introduces a systematic empirical framework for evaluating LLMs on elementary reasoning tasks, exposing their unpredictable failures across various conditions.

Findings

01

LLMs show consistent brittleness on set membership queries

02

Performance varies unpredictably with prompt phrasing and model choice

03

The study maps diverse failure modes of LLM reasoning abilities

Abstract

Large language models (LLMs) achieve superhuman performance on complex reasoning tasks, yet often fail on much simpler problems, raising concerns about their reliability and interpretability. We investigate this paradox through a focused study with two key design features: simplicity, to expose basic failure modes, and scale, to enable comprehensive controlled experiments. We focus on set membership queries -- among the most fundamental forms of reasoning -- using tasks like ``Is apple an element of the set \{pear, plum, apple, raspberry\}?''. We conduct a systematic empirical evaluation across prompt phrasing, semantic structure, element ordering, and model choice. Our large-scale analysis reveals that LLM performance on this elementary task is consistently brittle, and unpredictable across all dimensions, suggesting that the models' ``understanding'' of the set concept is fragmented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI