BRAINTEASER: Lateral Thinking Puzzles for Large Language Models
Yifan Jiang, Filip Ilievski, Kaixin Ma, Zhivar Sourati

TL;DR
BRAINTEASER introduces a novel benchmark of 1,100 lateral thinking puzzles to evaluate language models' ability to perform complex, human-like reasoning beyond common sense, revealing significant gaps in current models' capabilities.
Contribution
This paper presents the first large-scale lateral thinking benchmark for language models, including data creation, distractor generation, and adversarial testing methods.
Findings
Models significantly underperform humans on lateral thinking puzzles.
Model consistency decreases with adversarial question formats.
Benchmark data and evaluation code are publicly available.
Abstract
The success of language models has inspired the NLP community to attend to tasks that require implicit and complex reasoning, relying on human-like commonsense mechanisms. While such vertical thinking tasks have been relatively popular, lateral thinking puzzles have received little attention. To bridge this gap, we devise BRAINTEASER: a multiple-choice Question Answering task designed to test the model's ability to exhibit lateral thinking and defy default commonsense associations. We design a three-step procedure for creating the first lateral thinking benchmark, consisting of data collection, distractor generation, and generation of adversarial examples, leading to 1,100 puzzles with high-quality annotations. To assess the consistency of lateral reasoning by models, we enrich BRAINTEASER based on a semantic and contextual reconstruction of its questions. Our experiments with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
