RULEBREAKERS: Challenging LLMs at the Crossroads between Formal Logic and Human-like Reasoning

Jason Chan; Robert Gaizauskas; Zhixue Zhao

arXiv:2410.16502·cs.CL·August 18, 2025

RULEBREAKERS: Challenging LLMs at the Crossroads between Formal Logic and Human-like Reasoning

Jason Chan, Robert Gaizauskas, Zhixue Zhao

PDF

Open Access 1 Video

TL;DR

This paper introduces RULEBREAKERS, a dataset to evaluate large language models' ability to recognize and respond to rulebreaker scenarios in a human-like manner, revealing current models' limitations in aligning with human reasoning.

Contribution

The creation of the RULEBREAKERS dataset and the comprehensive evaluation of seven LLMs' performance on rulebreaker scenarios, highlighting their shortcomings in human-like reasoning.

Findings

01

Most LLMs perform poorly on rulebreaker detection.

02

Models tend to over-apply logical rules rigidly.

03

Current models show limited utilization of world knowledge.

Abstract

Formal logic enables computers to reason in natural language by representing sentences in symbolic forms and applying rules to derive conclusions. However, in what our study characterizes as "rulebreaker" scenarios, this method can lead to conclusions that are typically not inferred or accepted by humans given their common sense and factual knowledge. Inspired by works in cognitive science, we create RULEBREAKERS, the first dataset for rigorously evaluating the ability of large language models (LLMs) to recognize and respond to rulebreakers (versus non-rulebreakers) in a human-like manner. Evaluating seven LLMs, we find that most models, including GPT-4o, achieve mediocre accuracy on RULEBREAKERS and exhibit some tendency to over-rigidly apply logical rules unlike what is expected from typical human reasoners. Further analysis suggests that this apparent failure is potentially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

RULEBREAKERS: Challenging LLMs at the Crossroads between Formal Logic and Human-like Reasoning· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies

MethodsSoftmax · Attention Is All You Need