SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense
Yifan Jiang, Filip Ilievski, Kaixin Ma

TL;DR
This paper introduces BRAINTEASER(S), a new benchmark and task for evaluating AI systems' ability to perform lateral thinking and defy commonsense in reasoning, with extensive competition results and analysis.
Contribution
It presents the first task and benchmark supporting both zero-shot and fine-tuning settings for lateral thinking evaluation in AI systems.
Findings
483 team submissions from 182 participants
Systems show limited lateral thinking ability
Analysis highlights challenges in reasoning and commonsense defiance
Abstract
While vertical thinking relies on logical and commonsense reasoning, lateral thinking requires systems to defy commonsense associations and overwrite them through unconventional thinking. Lateral thinking has been shown to be challenging for current models but has received little attention. A recent benchmark, BRAINTEASER, aims to evaluate current models' lateral thinking ability in a zero-shot setting. In this paper, we split the original benchmark to also support fine-tuning setting and present SemEval Task 9: BRAIN-TEASER(S), the first task at this competition designed to test the system's reasoning and lateral thinking ability. As a popular task, BRAINTEASER(S)'s two subtasks receive 483 team submissions from 182 participants during the competition. This paper provides a fine-grained system analysis of the competition results, together with a reflection on what this means for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsInnovative Teaching and Learning Methods · Neuroscience, Education and Cognitive Function
