Exploring Decision-Making Capabilities of LLM Agents: An Experimental Study on Jump-Jump Game
Juwu Li

TL;DR
This paper investigates the decision-making abilities of large language model (LLM) agents through an experimental study using the Jump-Jump game, which tests spatial reasoning, physical modeling, and strategic planning.
Contribution
It introduces an experimental framework for evaluating LLM decision-making in a challenging casual game environment, highlighting the models' capabilities and limitations.
Findings
LLMs can perform basic spatial reasoning in the game.
Decision accuracy varies with game complexity.
Insights into LLMs' strategic planning abilities.
Abstract
The Jump-Jump game, as a simple yet challenging casual game, provides an ideal testing environment for studying LLM decision-making capabilities. The game requires players to precisely control jumping force based on current position and target platform distance, involving multiple cognitive aspects including spatial reasoning, physical modeling, and strategic planning. It illustrates the basic gameplay mechanics of the Jump-Jump game, where the player character (red circle) must jump across platforms with appropriate force to maximize score.
| Version | Avg Score | Success Rate | Avg Duration | Stability |
|---|---|---|---|---|
| Basic | 3.2 | 68% | 12.5s | Low |
| Optimized | 7.8 | 84% | 28.3s | Medium |
| Complete | 12.1 | 91% | 45.7s | High |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Social Robot Interaction and HRI · Multi-Agent Systems and Negotiation
Exploring Decision-Making Capabilities of LLM Agents:
An Experimental Study on Jump-Jump Game
Juwu Li Jiangxi Teachers College
1 Introduction
With the rapid advancement of artificial intelligence technology, Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language processing Asemi et al. (2020); Okunlaya et al. (2022); Oyelude (2021). These models not only understand and generate human language but also exhibit impressive reasoning and decision-making abilities in specific tasks Zhang et al. (2023, 2024). However, the performance of LLMs in gaming scenarios requiring real-time decisions remains to be thoroughly explored Zheng et al. (2023).
The Jump-Jump game Ali et al. (2021); Andrews et al. (2021); Arora et al. (2020), as a simple yet challenging casual game, provides an ideal testing environment for studying LLM decision-making capabilities. The game requires players to precisely control jumping force based on current position and target platform distance, involving multiple cognitive aspects including spatial reasoning, physical modeling, and strategic planning Oyetola and others (2023); Baungarten-Leon et al. (2024); Medavarapu (2024); Panda (2025); Lund and Ma (2021); Shahriar et al. (2024); Lund (2023); Wang and Lund (2023); Lund and Shamsi (2023). Figure 1 illustrates the basic gameplay mechanics of the Jump-Jump game, where the player character (red circle) must jump across platforms with appropriate force to maximize score.
The main contributions of this research include:
- •
We design and implement an LLM-based Jump-Jump game agent.
- •
We propose the systematic prompt optimization strategies. Experimental validation of different prompt designs’ impact on agent performance.
- •
We give analysis of LLM advantages and limitations in game decision-making.
2 System Model
2.1 Environment Definition
The Jump-Jump game environment consists of the following core components:
(1) Game State Space:
- •
Player position: , representing the character’s coordinates in 2D space.
- •
Target platform: , defining platform boundaries.
- •
Physical parameters: gravity acceleration, velocity multipliers.
(2) Action Space:
Jumping force: continuous values from 0-100, controlling jump distance.
(3) State Transition Function:
The jumping mechanism follows simplified physics laws:
[TABLE]
(4) Reward Function:
- •
Successful landing: +1 point
- •
Jump failure: game over
2.2 LLM Agent Architecture
The core architecture of the LLM Agent includes four main modules, as shown in Figure 2: Perception Module: This module serves as the input interface, responsible for receiving and preprocessing game state information. It captures essential environmental data including player position coordinates, target platform boundaries, and relevant physical parameters, then formats this information into a structured representation suitable for the reasoning module. Reasoning Module: Acting as the decision-making core, this module processes the formatted game state through carefully designed prompts. It leverages the LLM’s natural language understanding and reasoning capabilities to analyze the current situation, apply game physics principles, and formulate jumping strategies based on the provided context and examples Zheng et al. (2025a); Wang et al. (2025); Zheng et al. (2025b). Action Module: This module translates the reasoning module’s decision into executable game actions. It outputs precise jumping force values (ranging from 0-100) based on the LLM’s analysis, ensuring the output format meets the game environment’s requirements. Feedback Module: Responsible for learning and adaptation, this module monitors game execution results and provides feedback for strategy adjustment. It analyzes successful and failed attempts to inform future decision-making processes, contributing to the agent’s overall performance improvement. The information flow can be represented as: Game State Perception Module Prompt Processing Reasoning Module LLM Analysis Action Module Force Output Game Execution Feedback Module Strategy Adjustment. This architecture enables the agent to maintain continuous interaction with the game environment while leveraging the LLM’s reasoning capabilities for optimal decision-making.
3 Method
3.1 Basic Prompt Design
The foundation of our LLM agent’s decision-making capability lies in the careful design of prompts that enable the model to understand the game context and make appropriate jumping decisions. Our basic prompt design follows a structured approach that incorporates role definition, task description, game mechanics explanation, and output format specification. The initial prompt structure begins with clearly defining the agent’s role as a Jump-Jump game player. We provide the LLM with essential context about its responsibilities, emphasizing that it needs to analyze the current game state and determine the optimal jumping force. The prompt includes a comprehensive explanation of the game’s physics model, detailing how the jumping force translates into horizontal and vertical velocities, and how gravity affects the character’s trajectory.
To ensure consistent decision-making, we establish a standardized input format that provides the agent with all necessary information: the player’s current position coordinates , the target platform boundaries , and the relevant physical parameters including velocity multipliers and gravity acceleration. This structured input format enables the LLM to process game state information systematically.
The basic prompt also incorporates fundamental strategic guidance, instructing the agent to consider the horizontal distance to the target platform and estimate the required force based on the physics model. We emphasize the importance of precision, as both under-jumping and over-jumping result in failure. The output format is strictly defined to return only a numerical value between 0 and 100, representing the recommended jumping force.
3.2 Prompt Optimization Strategies
Building upon the basic design, we implemented several optimization strategies to enhance the agent’s performance through iterative prompt refinement. These strategies address common failure patterns observed during initial testing and incorporate advanced reasoning techniques. Our first optimization strategy involves incorporating step-by-step reasoning guidance. We restructure the prompt to encourage the LLM to follow a systematic decision-making process: first calculating the horizontal distance to the target, then estimating the required trajectory based on physics principles, considering safety margins for precision, and finally determining the optimal force value. This structured reasoning approach significantly reduces calculation errors and improves decision consistency Saeidnia et al. (2024); Okunlaya et al. (2022); Shahriar et al. (2024).
The second major optimization introduces few-shot learning through carefully selected examples. We include 3-5 representative scenarios in the prompt, each demonstrating the complete reasoning process from input analysis to force determination. These examples cover various distance ranges and edge cases, helping the LLM understand the relationship between game state and appropriate actions. Each example includes the input state, detailed reasoning steps, recommended force, and expected outcome, providing a comprehensive learning template.
To address the precision requirements of the game, we implement a calibration strategy that adjusts force recommendations based on observed patterns. Through empirical testing, we discovered that the basic physics calculations often require fine-tuning factors to account for the game’s specific implementation. We incorporate these calibration guidelines into the prompt, instructing the agent to apply distance-dependent adjustments and consider platform size variations.
Our final optimization strategy focuses on error prevention and recovery. We enhance the prompt with explicit warnings about common failure modes, such as the tendency to over-jump on longer distances or under-estimate force requirements for closer platforms. The optimized prompt includes decision validation steps, encouraging the agent to double-check its calculations and consider alternative force values when uncertainty exists.
The complete optimization process results in a multi-layered prompt that combines clear role definition, structured reasoning guidance, empirical examples, calibration factors, and error prevention mechanisms. This comprehensive approach enables the LLM agent to make more accurate and consistent decisions while maintaining adaptability to varying game conditions.
4 Experiment
4.1 Performance Comparison Results
Table 1 presents the comprehensive performance comparison across different versions of our LLM agent.
The performance trends are visualized in Figure 3, which clearly demonstrates the improvement achieved through prompt optimization.
4.2 Detailed Analysis
4.2.1 Learning Curve Analysis
The Complete Version agent demonstrated certain adaptability during gameplay. As shown in Figure 4, the decision accuracy improved with increasing game rounds, likely due to the strategic guidance included in the prompts.
4.2.2 Error Pattern Analysis
Through analysis of failure cases, we identified the main error patterns, as illustrated in Figure 5:
Over-jumping (35%): Excessive force causing overshooting 2. 2.
Under-jumping (28%): Insufficient force failing to reach platform 3. 3.
Calculation errors (22%): Deviations in physics calculations 4. 4.
Other errors (15%): Including format errors, etc.
4.2.3 Prompt Optimization Effect
The contribution analysis of various optimization strategies:
- •
Strategy guidance: approximately 12% improvement in success rate
- •
Example learning: approximately 8% improvement in success rate
- •
Output format standardization: 15% reduction in invalid outputs
4.3 Case Studies
Successful Case:
Failure Case:
5 Limitations and Conclusion
The limitations of this method include: Computational Precision Constraints: LLMs may exhibit errors in numerical calculations, particularly in complex physical modeling scenarios. Real-time Performance Issues: Each decision requires LLM API calls, introducing latency unsuitable for games requiring extremely high real-time performance. Consistency Problems: LLMs may produce different outputs for identical inputs, affecting decision stability Chahal et al. (2021); Souppaya (2024); Clark et al. (2021).
Conclusion: the experimental results demonstrate that LLM agents can achieve satisfactory performance in structured game environments through careful prompt engineering. However, challenges remain in computational accuracy, consistency, and real-time performance that require further investigation and improvement.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ali et al. [2021] Muhammad Yousuf Ali, Salman Bin Naeem, and Rubina Bhatti. Artificial intelligence (ai) in pakistani university library services. Library Hi Tech News , 38(8):12–15, 2021.
- 2Andrews et al. [2021] James E. Andrews, Heather Ward, and Jung Won Yoon. Utaut as a model for understanding intention to adopt ai and related technologies among librarians. The Journal of Academic Librarianship , 47(6):102437, 2021.
- 3Arora et al. [2020] Dipti Arora, Alka Bansal, and Nishant Kumar. Invigorating libraries with application of artificial intelligence. Library Philosophy and Practice , page 3630, 2020.
- 4Asemi et al. [2020] Asefeh Asemi, Andrea Ko, and Mohsen Nowkarizi. Intelligent libraries: A review on expert systems, artificial intelligence, and robot. Library Hi Tech , 39(2):412–434, 2020.
- 5Baungarten-Leon et al. [2024] Emilio Isaac Baungarten-Leon, Susana Ortega-Cisneros, Mohamed Abdelmoneum, et al. The genesis of ai by ai integrated circuit: Where ai creates ai. Electronics , 13(9):1704, 2024.
- 6Chahal et al. [2021] Husanjot Chahal, Sara Abdulla, Jonathan Murdick, and Ilya Rahkovsky. Mapping india’s ai potential. Technical report, Center for Security and Emerging Technology, 2021.
- 7Clark et al. [2021] Jack Clark, Kyle Miller, and Rebecca Gelles. Measuring ai development: A prototype methodology to inform policy. Technical report, Center for Security and Emerging Technology, 2021.
- 8Lund and Ma [2021] Breanne D. Lund and Jingjing Ma. A review of cluster analysis techniques and their uses in library and information science research: K-means and k-medoids clustering. Performance Measurement and Metrics , 22(3):161–173, 2021.
