On the logical skills of large language models: evaluations using   arbitrarily complex first-order logic problems

Shokhrukh Ibragimov; Arnulf Jentzen; Benno Kuckuck

arXiv:2502.14180·cs.LG·February 21, 2025

On the logical skills of large language models: evaluations using arbitrarily complex first-order logic problems

Shokhrukh Ibragimov, Arnulf Jentzen, Benno Kuckuck

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to generate complex first-order logic problems to evaluate the logical reasoning skills of large language models, revealing their capabilities across varying difficulty levels.

Contribution

It presents a novel dataset generation approach for controlled complexity in first-order logic problems and evaluates LLMs' reasoning abilities on these datasets.

Findings

01

LLMs show varying performance depending on problem complexity.

02

Recent models like DeepSeek-R1 and o3-mini demonstrate notable reasoning skills.

03

The datasets and evaluation code are publicly available for further research.

Abstract

We present a method of generating first-order logic statements whose complexity can be controlled along multiple dimensions. We use this method to automatically create several datasets consisting of questions asking for the truth or falsity of first-order logic statements in Zermelo-Fraenkel set theory. While the resolution of these questions does not require any knowledge beyond basic notation of first-order logic and set theory, it does require a degree of planning and logical reasoning, which can be controlled up to arbitrarily high difficulty by the complexity of the generated statements. Furthermore, we do extensive evaluations of the performance of various large language models, including recent models such as DeepSeek-R1 and OpenAI's o3-mini, on these datasets. All of the datasets along with the code used for generating them, as well as all data from the evaluations is publicly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bkuckuck/logical-skills-of-llms
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsSparse Evolutionary Training