Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models

Lachlan McGinness; Peter Baumgartner

arXiv:2505.19676·cs.AI·September 18, 2025

Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models

Lachlan McGinness, Peter Baumgartner

PDF

Open Access

TL;DR

This paper investigates the reasoning capabilities of the latest large language models, revealing that progress has stalled and that improvements are mainly due to prompt engineering rather than genuine reasoning enhancements.

Contribution

The study provides a comprehensive evaluation of recent LLMs' reasoning abilities, highlighting the stagnation in progress and analyzing the impact of prompting strategies on reasoning performance.

Findings

01

Progress in LLM reasoning has stalled over nine months.

02

Most improvements are due to prompt engineering and training strategies.

03

Current models best follow bottom-up reasoning strategies.

Abstract

Empirical methods to examine the capability of Large Language Models (LLMs) to use Automated Theorem Prover (ATP) reasoning strategies are studied. We evaluate the performance of State of the Art models from December 2023 and August 2024 on PRONTOQA steamroller reasoning problems. For that, we develop methods for assessing LLM response accuracy and correct answer correlation. Our results show that progress in improving LLM reasoning abilities has stalled over the nine month period. By tracking completion tokens, we show that almost all improvement in reasoning ability since GPT-4 was released can be attributed to either hidden system prompts or the training of models to automatically use generic Chain of Thought prompting strategies. Among the ATP reasoning strategies tried, we found that current frontier LLMs are best able to follow the bottom-up (also known as forward-chaining)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multi-Agent Systems and Negotiation

MethodsLinear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Multi-Head Attention · Attention Is All You Need · Layer Normalization · Byte Pair Encoding