JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models

Michael K. Chen; Xikun Zhang; Dacheng Tao

arXiv:2501.14851·cs.CL·May 12, 2025

JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models

Michael K. Chen, Xikun Zhang, Dacheng Tao

PDF

Open Access 1 Repo 2 Models 1 Datasets

TL;DR

JustLogic is a new, complex, and knowledge-independent benchmark designed to rigorously evaluate deductive reasoning in large language models, revealing significant gaps between current models and human reasoning capabilities.

Contribution

We introduce JustLogic, a synthetic benchmark that addresses existing evaluation limitations by increasing task complexity, removing prior knowledge bias, and enabling detailed error analysis.

Findings

01

SOTA reasoning LLMs match or surpass human average but fall short of human ceiling.

02

Non-reasoning models underperform compared to humans.

03

JustLogic enables in-depth analysis of reasoning depth and argument structure effects.

Abstract

Logical reasoning is a critical component of Large Language Models (LLMs), and substantial research efforts in recent years have aimed to enhance their deductive reasoning capabilities. However, existing deductive reasoning benchmarks, which are crucial for evaluating and advancing LLMs, are inadequate due to their lack of task complexity, presence of prior knowledge as a confounder, and superficial error analysis. To address these deficiencies, we introduce JustLogic, a synthetically generated deductive reasoning benchmark designed for rigorous evaluation of LLMs. JustLogic is (i) highly complex, capable of generating a diverse range of linguistic patterns, vocabulary, and argument structures; (ii) prior knowledge independent, eliminating the advantage of models possessing prior knowledge and ensuring that only deductive reasoning is used to answer questions; and (iii) capable of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

michaelchen-lab/justlogic
noneOfficial

Models

Datasets

michaelchenkj/JustLogic
dataset· 362 dl
362 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI)