Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks

Shlok Shelat; Jay Raval; Souvik Roy; Manas Gaur

arXiv:2601.13392·cs.CL·January 21, 2026

Beyond Memorization: Testing LLM Reasoning on Unseen Theory of Computation Tasks

Shlok Shelat, Jay Raval, Souvik Roy, Manas Gaur

PDF

Open Access

TL;DR

This paper evaluates large language models' ability to perform formal reasoning on unseen automata construction tasks, revealing significant gaps in understanding despite high performance on familiar problems.

Contribution

It introduces a new benchmark for DFA construction from regular languages, highlighting the limitations of LLMs in generalizing reasoning to unseen, complex problems.

Findings

01

Models excel on factual questions and seen tasks.

02

Performance drops significantly on unseen problems.

03

Errors are due to misinterpretation of constraints and semantics.

Abstract

Large language models (LLMs) have demonstrated strong performance on formal language tasks, yet whether this reflects genuine symbolic reasoning or pattern matching on familiar constructions remains unclear. We introduce a benchmark for deterministic finite automata (DFA) construction from regular languages, comprising factual knowledge questions, seen construction problems from public sources, and two types of unseen problems: hand-crafted instances with multiple interacting constraints and systematically generated problems via Arden's theorem. Models achieve perfect accuracy on factual questions and 84-90% on seen tasks. However, accuracy drops sharply on unseen problems (by 30-64%), with failures stemming from systematic misinterpretation of language constraints, incorrect handling of Kleene-star semantics, and a failure to preserve global consistency. We evaluate a three-stage hint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Algorithms