WILT: A Multi-Turn, Memorization-Robust Inductive Logic Benchmark for LLMs
Eryk Banatt, Jonathan Cheng, Skanda Vaidyanath, Tiffany Hwu

TL;DR
WILT is a multi-turn reasoning benchmark designed to evaluate LLMs' ability to infer hidden logical functions through interactive testing, exposing their limitations in complex reasoning tasks beyond memorization.
Contribution
This paper introduces WILT, a novel multi-turn reasoning benchmark that challenges LLMs to infer hidden functions through interactive testing, resisting memorization and highlighting reasoning weaknesses.
Findings
LLMs achieve only 28% accuracy on WILT.
Models show varied strengths in hypothesis narrowing and function deduction.
WILT exposes significant reasoning gaps in current LLMs.
Abstract
While large language models have shown impressive capabilities across a wide range of domains, they still encounter significant challenges in reasoning tasks that require gathering evidence over multiple turns and drawing logical conclusions. These challenges present significant obstacles for LLM chat user interfaces, which rely on multi-turn interactions to facilitate effective collaboration. This limitation leads to real-world issues; for example, service chatbots must gather necessary information from customers over multiple turns to diagnose and resolve problems effectively. Despite the multi-turn nature of many real-world LLM use cases, most existing benchmarks rely on carefully curated single-turn tests, which often blur the line between memorization and genuine reasoning. To address this, we introduce the Wason Inductive Logic Test (WILT), a simple yet challenging multi-turn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVLSI and Analog Circuit Testing · Low-power high-performance VLSI design · Advancements in Semiconductor Devices and Circuit Design
Methodstravel james
