Sudoku-Bench: Evaluating creative reasoning with Sudoku variants
Jeffrey Seely, Yuki Imajuku, Tianyu Zhao, Edoardo Cetin, Llion Jones

TL;DR
Sudoku-Bench is a new benchmark designed to evaluate creative and multi-step logical reasoning in large language models using challenging Sudoku variants that require novel problem-solving strategies.
Contribution
The paper introduces Sudoku-Bench, a curated set of unconventional Sudoku puzzles that effectively assess creative reasoning and provide tools for broad research application.
Findings
State-of-the-art LLMs solve less than 15% of puzzles unaided.
Sudoku variants challenge memorization, requiring logical breakthroughs.
The benchmark facilitates consistent evaluation of reasoning abilities.
Abstract
Existing reasoning benchmarks for large language models (LLMs) frequently fail to capture authentic creativity, often rewarding memorization of previously observed patterns. We address this shortcoming with Sudoku-Bench, a curated benchmark of challenging and unconventional Sudoku variants specifically selected to evaluate creative, multi-step logical reasoning. Sudoku variants form an unusually effective domain for reasoning research: each puzzle introduces unique or subtly interacting constraints, making memorization infeasible and requiring solvers to identify novel logical breakthroughs (``break-ins''). Despite their diversity, Sudoku variants maintain a common and compact structure, enabling clear and consistent evaluation. Sudoku-Bench includes a carefully chosen puzzle set, a standardized text-based puzzle representation, and flexible tools compatible with thousands of publicly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsgraph theory and CDMA systems
