LLM-Cave: A benchmark and light environment for large language models reasoning and decision-making system

Huanyu Li; Zongyuan Li; Wei Huang; Xian Guo

arXiv:2511.22598·cs.LG·December 1, 2025

LLM-Cave: A benchmark and light environment for large language models reasoning and decision-making system

Huanyu Li, Zongyuan Li, Wei Huang, Xian Guo

PDF

Open Access

TL;DR

LLM-Cave introduces a lightweight, structured environment and benchmark for evaluating large language models' reasoning and decision-making abilities, emphasizing multi-step reasoning and efficiency.

Contribution

This paper presents LLM-Cave, a novel environment and benchmark for assessing LLM reasoning and decision-making, highlighting strategies to improve performance in weaker models.

Findings

01

DeepSeek-R1 achieved highest success on complex tasks

02

Smaller models improved with Chain of Speculation and Planner-Critic strategies

03

Structured multi-step reasoning enhances decision-making in LLMs

Abstract

Large language models (LLMs) such as ChatGPT o1, ChatGPT o3, and DeepSeek R1 have shown great potential in solving difficult problems. However, current LLM evaluation benchmarks are limited to one-step interactions. Some of the existing sequence decision-making environments, such as TextStarCraftII and LLM-PySC2, are too complicated and require hours of interaction to complete a game. In this paper, we introduce LLM-Cave, a benchmark and light environment for LLM reasoning and decision-making systems. This environment is a classic instance in the era of Symbolism. Artificial intelligence enables the agent to explore the environment and avoid potential losses by reasoning about nearby dangers using partial observable state information. In the experiment, we evaluated the sequential reasoning ability, decision-making performance and computational efficiency of mainstream large language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Explainable Artificial Intelligence (XAI)