Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?

Jin Jiang; Jianing Wang; Yuchen Yan; Yang Liu; Jianhua Zhu; Mengdi Zhang; Xunliang Cai; Liangcai Gao

arXiv:2505.16998·cs.CL·May 23, 2025

Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?

Jin Jiang, Jianing Wang, Yuchen Yan, Yang Liu, Jianhua Zhu, Mengdi Zhang, Xunliang Cai, Liangcai Gao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper systematically evaluates large language models on complex logical reasoning tasks using formal languages, revealing their strengths, limitations, and the impact of training formats.

Contribution

It provides a comprehensive evaluation framework across models, tasks, and formats, and introduces a simple fine-tuning method to improve formal reasoning capabilities.

Findings

01

Thinking models outperform Instruct models with formal language.

02

All LLMs have limited inductive reasoning skills.

03

PoT format data yields best generalization performance.

Abstract

Large Language Models (LLMs) have been shown to achieve breakthrough performance on complex logical reasoning tasks. Nevertheless, most existing research focuses on employing formal language to guide LLMs to derive reliable reasoning paths, while systematic evaluations of these capabilities are still limited. In this paper, we aim to conduct a comprehensive evaluation of LLMs across various logical reasoning problems utilizing formal languages. From the perspective of three dimensions, i.e., spectrum of LLMs, taxonomy of tasks, and format of trajectories, our key findings are: 1) Thinking models significantly outperform Instruct models, especially when formal language is employed; 2) All LLMs exhibit limitations in inductive reasoning capability, irrespective of whether they use a formal language; 3) Data with PoT format achieves the best generalization performance across other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiangjin1999/formaleval
noneOfficial

Videos

Do Large Language Models excel in Complex Logical Reasoning with Formal Language?· underline

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques