JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models
Michael K. Chen, Xikun Zhang, Dacheng Tao

TL;DR
JustLogic is a new, complex, and knowledge-independent benchmark designed to rigorously evaluate deductive reasoning in large language models, revealing significant gaps between current models and human reasoning capabilities.
Contribution
We introduce JustLogic, a synthetic benchmark that addresses existing evaluation limitations by increasing task complexity, removing prior knowledge bias, and enabling detailed error analysis.
Findings
SOTA reasoning LLMs match or surpass human average but fall short of human ceiling.
Non-reasoning models underperform compared to humans.
JustLogic enables in-depth analysis of reasoning depth and argument structure effects.
Abstract
Logical reasoning is a critical component of Large Language Models (LLMs), and substantial research efforts in recent years have aimed to enhance their deductive reasoning capabilities. However, existing deductive reasoning benchmarks, which are crucial for evaluating and advancing LLMs, are inadequate due to their lack of task complexity, presence of prior knowledge as a confounder, and superficial error analysis. To address these deficiencies, we introduce JustLogic, a synthetically generated deductive reasoning benchmark designed for rigorous evaluation of LLMs. JustLogic is (i) highly complex, capable of generating a diverse range of linguistic patterns, vocabulary, and argument structures; (ii) prior knowledge independent, eliminating the advantage of models possessing prior knowledge and ensuring that only deductive reasoning is used to answer questions; and (iii) capable of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI)
