LLM-Cave: A benchmark and light environment for large language models reasoning and decision-making system
Huanyu Li, Zongyuan Li, Wei Huang, Xian Guo

TL;DR
LLM-Cave introduces a lightweight, structured environment and benchmark for evaluating large language models' reasoning and decision-making abilities, emphasizing multi-step reasoning and efficiency.
Contribution
This paper presents LLM-Cave, a novel environment and benchmark for assessing LLM reasoning and decision-making, highlighting strategies to improve performance in weaker models.
Findings
DeepSeek-R1 achieved highest success on complex tasks
Smaller models improved with Chain of Speculation and Planner-Critic strategies
Structured multi-step reasoning enhances decision-making in LLMs
Abstract
Large language models (LLMs) such as ChatGPT o1, ChatGPT o3, and DeepSeek R1 have shown great potential in solving difficult problems. However, current LLM evaluation benchmarks are limited to one-step interactions. Some of the existing sequence decision-making environments, such as TextStarCraftII and LLM-PySC2, are too complicated and require hours of interaction to complete a game. In this paper, we introduce LLM-Cave, a benchmark and light environment for LLM reasoning and decision-making systems. This environment is a classic instance in the era of Symbolism. Artificial intelligence enables the agent to explore the environment and avoid potential losses by reasoning about nearby dangers using partial observable state information. In the experiment, we evaluated the sequential reasoning ability, decision-making performance and computational efficiency of mainstream large language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Explainable Artificial Intelligence (XAI)
