LogicSkills: A Structured Benchmark for Formal Reasoning in Large Language Models

Brian Rabern; Philipp Mondorf; Barbara Plank

arXiv:2602.06533·cs.AI·March 18, 2026

LogicSkills: A Structured Benchmark for Formal Reasoning in Large Language Models

Brian Rabern, Philipp Mondorf, Barbara Plank

PDF

Open Access

TL;DR

This paper introduces LogicSkills, a benchmark designed to evaluate fundamental logical skills in large language models, revealing strengths in validity assessment but weaknesses in symbolization and countermodel construction.

Contribution

The paper presents a novel benchmark isolating core logical skills and evaluates LLMs, highlighting gaps in their logical reasoning capabilities.

Findings

01

High performance in validity assessment by LLMs

02

Lower performance in formal symbolization and countermodel construction

03

Reasoning-tuned models perform better across all skills

Abstract

Large language models perform well on many logical reasoning benchmarks, but it remains unclear which core logical skills they truly master. To address this, we introduce LogicSkills, a benchmark that isolates three fundamental logical skills: (i) $formal symbolization \unicode x 2014$ translating premises into first-order logic; (ii) $countermodel construction \unicode x 2014$ showing that an argument is logically invalid by constructing a finite countermodel; and (iii) $validity assessment \unicode x 2014$ determining whether a conclusion follows from a set of premises. Items are drawn from the two-variable fragment of first-order logic without identity and are presented in both English and a Carrollian nonce-word language. All instances are solver-verified with Z3 for correctness and non-triviality. Across conventional instruction-tuned LLMs, performance is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Constraint Satisfaction and Optimization