Reasoning Capabilities of Large Language Models on Dynamic Tasks

Annie Wong; Thomas B\"ack; Aske Plaat; Niki van Stein; Anna V. Kononova

arXiv:2505.10543·cs.AI·August 12, 2025

Reasoning Capabilities of Large Language Models on Dynamic Tasks

Annie Wong, Thomas B\"ack, Aske Plaat, Niki van Stein, Anna V. Kononova

PDF

Open Access 1 Repo

TL;DR

This paper evaluates large language models' reasoning abilities in dynamic tasks, revealing performance gaps, the impact of prompting strategies, and persistent limitations compared to human reasoning.

Contribution

It systematically assesses prompting strategies on dynamic tasks, highlighting their effects and the ongoing challenges in achieving human-like reasoning in large language models.

Findings

01

Larger models generally outperform smaller ones.

02

Strategic prompting can narrow performance gaps.

03

Advanced prompting benefits smaller models more on complex tasks.

Abstract

Large language models excel on static benchmarks, but their ability as self-learning agents in dynamic environments remains unclear. We evaluate three prompting strategies: self-reflection, heuristic mutation, and planning across dynamic tasks with open-source models. We find that larger models generally outperform smaller ones, but that strategic prompting can close this performance gap. Second, an overly long prompt can negatively impact smaller models on basic reactive tasks, while larger models show more robust behaviour. Third, advanced prompting techniques primarily benefit smaller models on complex games, but offer less improvement for already high-performing large language models. Yet, we find that advanced reasoning methods yield highly variable outcomes: while capable of significantly improving performance when reasoning and decision-making align, they also introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ann-w/towards-a-deeper-understanding-of-reasoning-capabilities-in-large-language-models
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Language and cultural evolution

MethodsSelf-Learning