APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight
Wanjing Huang, Weixiang Yan, Zhen Zhang, Ambuj Singh

TL;DR
APEX enhances large language models with physics-based foresight, enabling better real-time task planning and decision-making in dynamic environments by modeling object interactions and predicting outcomes.
Contribution
This work introduces APEX, a novel framework that integrates physics reasoning into LLMs for improved real-world task planning without task-specific training.
Findings
APEX outperforms standard LLMs and VLM-based models on benchmarks.
Physics-informed prediction improves long-horizon planning.
Explicit physics reasoning bridges the gap between language and real-world tasks.
Abstract
Large Language Models (LLMs) demonstrate strong reasoning and task planning capabilities but remain fundamentally limited in physical interaction modeling. Existing approaches integrate perception via Vision-Language Models (VLMs) or adaptive decision-making through Reinforcement Learning (RL), but they fail to capture dynamic object interactions or require task-specific training, limiting their real-world applicability. We introduce APEX (Anticipatory Physics-Enhanced Execution), a framework that equips LLMs with physics-driven foresight for real-time task planning. APEX constructs structured graphs to identify and model the most relevant dynamic interactions in the environment, providing LLMs with explicit physical state updates. Simultaneously, APEX provides low-latency forward simulations of physically feasible actions, allowing LLMs to select optimal strategies based on predictive…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- Pragmatic integration of a physics engine into an LLM loop; the overall system is easy to understand and replicate in spirit. - Clear problem motivation: teaching LLM agents to rely on external physics tools rather than internalizing fragile physical heuristics is a sensible direction. - Breadth of tasks (synthetic QA, games/toy control, PHYRE) demonstrates that the loop can be wired up across settings.
- limited methodological novelty - single-step lookahead: the algorithmic core is a 1-step brute-force evaluation of enumerated actions; claims and tables about multi-step/higher-horizon complexity are not matched by controlled, implemented evidence. - Tetris lacks standard hand-coded/Tetris-AI baselines (e.g., height/holes/bumpiness heuristics, MCTS with lookahead). - Obstacle avoidance omits classic MPC/DWA/A*/RRT with the same simulator and budget. - PHYRE relies on 10k uniformly random actio
- The paper attempts to address the physical reasoning limitations of LLMs by making use of physics engines. - The paper provides some concrete examples of tasks and model outputs in the appendix.
- Overall, the introduced pipeline is very limiting. The framework makes many assumptions, but none are explicitly described in the paper. For example, the scene graph formulation assumes the scene contains mostly distinctive and rigid objects. Since the predictions are simulated with a physical engine, the method also assumes those objects are compatible with the simulator. What constraints are imposed on the acceptable objects in the scene? Moreover, the decision-making procedure requires a fi
1. Interesting high-level motivation. Bridging symbolic reasoning in LLMs with physically grounded modeling is an important and timely goal. 2. Attempt to unify physics reasoning and LLM planning. The modular architecture (graph → simulator → LLM → action) provides a readable system outline.
1. Conceptual novelty is limited. The core idea—using a physics engine to simulate candidate actions and feeding the results back to an LLM—is conceptually straightforward and has appeared in prior “simulation-in-the-loop” or “world-model prompting” works (e.g., Mind’s Eye, PiLoT, PhysVLM). The proposed Perception–Graph–Language–Physics–Action paradigm is mostly a re-labeling of existing perception-simulation-planning loops in robotics; there is no theoretical or algorithmic advance beyond modul
1. Introduces a novel graph-simulation-loop architecture to help LLMs in physical reasoning. 2. Evaluation across three diverse benchmarks with strong baselines and ablations. 3. Well-organized and accessible, with clear explanations of both high-level ideas and implementation details.
1. The framework's applicability is currently limited to motion-related problems, particularly collision prediction. This narrow scope and its reliance on a specific simulator make it brittle; adapting it to new tasks or physical phenomena likely requires significant re-implementation. 2. To demonstrate generalizability, future work should include experiments in more complex settings, such as those with cluttered scenes or deformable objects. 3. Although APEX's modularity is a strength, the pape
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Scientific Computing and Data Management · Cloud Computing and Resource Management
MethodsCausal inference
