What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles

Mengtao Zhou; Sifan Wu; Huan Zhang; Qi Sima; Bang Liu

arXiv:2508.10358·cs.AI·August 15, 2025

What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles

Mengtao Zhou, Sifan Wu, Huan Zhang, Qi Sima, Bang Liu

PDF

1 Video

TL;DR

This paper introduces TurtleSoup-Bench, a bilingual interactive benchmark, and Mosaic-Agent, a novel evaluation agent, to assess the imaginative reasoning capabilities of Large Language Models through dynamic, exploratory puzzles.

Contribution

It presents the first large-scale, bilingual benchmark and an agent specifically designed to evaluate LLMs' imaginative reasoning in a dynamic, hypothesis-driven environment.

Findings

01

LLMs show significant limitations in imaginative reasoning.

02

Common failure patterns identified in LLMs' reasoning processes.

03

Performance gap between LLMs and human reasoning capabilities.

Abstract

We investigate the capacity of Large Language Models (LLMs) for imaginative reasoning--the proactive construction, testing, and revision of hypotheses in information-sparse environments. Existing benchmarks, often static or focused on social deduction, fail to capture the dynamic, exploratory nature of this reasoning process. To address this gap, we introduce a comprehensive research framework based on the classic "Turtle Soup" game, integrating a benchmark, an agent, and an evaluation protocol. We present TurtleSoup-Bench, the first large-scale, bilingual, interactive benchmark for imaginative reasoning, comprising 800 turtle soup puzzles sourced from both the Internet and expert authors. We also propose Mosaic-Agent, a novel agent designed to assess LLMs' performance in this setting. To evaluate reasoning quality, we develop a multi-dimensional protocol measuring logical consistency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles· underline