Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models

Simeng Han; Howard Dai; Stephen Xia; Grant Zhang; Chen Liu; Lichang Chen; Hoang Huy Nguyen; Hongyuan Mei; Jiayuan Mao; R. Thomas McCoy

arXiv:2505.10844·cs.AI·October 30, 2025

Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models

Simeng Han, Howard Dai, Stephen Xia, Grant Zhang, Chen Liu, Lichang Chen, Hoang Huy Nguyen, Hongyuan Mei, Jiayuan Mao, R. Thomas McCoy

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces a brainteaser benchmark to analyze large language models' reasoning strategies, revealing their ability to generate creative solutions while also relying on brute force in some cases.

Contribution

It presents a novel benchmark using narrative brainteasers to evaluate reasoning strategies and assesses LLMs' creativity and problem-solving approaches.

Findings

01

LLMs can produce creative, insightful solutions to brainteasers.

02

Models sometimes rely on brute force despite available creative solutions.

03

The benchmark reveals strengths and limitations in LLM reasoning abilities.

Abstract

Accuracy remains a standard metric for evaluating AI systems, but it offers limited insight into how models arrive at their solutions. In this work, we introduce a benchmark based on brainteasers written in long narrative form to probe more deeply into the types of reasoning strategies that models use. Brainteasers are well-suited for this goal because they can be solved with multiple approaches, such as a few-step solution that uses a creative insight or a longer solution that uses more brute force. We investigate large language models (LLMs) across multiple layers of reasoning, focusing not only on correctness but also on the quality and creativity of their solutions. We investigate many aspects of the reasoning process: (1) semantic parsing of the brainteasers into precise mathematical competition style formats; (2) generating solutions from these mathematical forms; (3)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ChenLiu1996/Brainteaser
dataset· 13 dl
13 dl

Videos

Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models· slideslive

Taxonomy

TopicsMachine Learning in Materials Science · Artificial Intelligence in Healthcare and Education · Topic Modeling