APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight

Wanjing Huang; Weixiang Yan; Zhen Zhang; Ambuj Singh

arXiv:2505.13921·cs.RO·October 17, 2025

APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight

Wanjing Huang, Weixiang Yan, Zhen Zhang, Ambuj Singh

PDF

Open Access 1 Repo 4 Reviews

TL;DR

APEX enhances large language models with physics-based foresight, enabling better real-time task planning and decision-making in dynamic environments by modeling object interactions and predicting outcomes.

Contribution

This work introduces APEX, a novel framework that integrates physics reasoning into LLMs for improved real-world task planning without task-specific training.

Findings

01

APEX outperforms standard LLMs and VLM-based models on benchmarks.

02

Physics-informed prediction improves long-horizon planning.

03

Explicit physics reasoning bridges the gap between language and real-world tasks.

Abstract

Large Language Models (LLMs) demonstrate strong reasoning and task planning capabilities but remain fundamentally limited in physical interaction modeling. Existing approaches integrate perception via Vision-Language Models (VLMs) or adaptive decision-making through Reinforcement Learning (RL), but they fail to capture dynamic object interactions or require task-specific training, limiting their real-world applicability. We introduce APEX (Anticipatory Physics-Enhanced Execution), a framework that equips LLMs with physics-driven foresight for real-time task planning. APEX constructs structured graphs to identify and model the most relevant dynamic interactions in the environment, providing LLMs with explicit physical state updates. Simultaneously, APEX provides low-latency forward simulations of physically feasible actions, allowing LLMs to select optimal strategies based on predictive…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 2

Strengths

- Pragmatic integration of a physics engine into an LLM loop; the overall system is easy to understand and replicate in spirit. - Clear problem motivation: teaching LLM agents to rely on external physics tools rather than internalizing fragile physical heuristics is a sensible direction. - Breadth of tasks (synthetic QA, games/toy control, PHYRE) demonstrates that the loop can be wired up across settings.

Weaknesses

- limited methodological novelty - single-step lookahead: the algorithmic core is a 1-step brute-force evaluation of enumerated actions; claims and tables about multi-step/higher-horizon complexity are not matched by controlled, implemented evidence. - Tetris lacks standard hand-coded/Tetris-AI baselines (e.g., height/holes/bumpiness heuristics, MCTS with lookahead). - Obstacle avoidance omits classic MPC/DWA/A*/RRT with the same simulator and budget. - PHYRE relies on 10k uniformly random actio

Reviewer 02Rating 2Confidence 4

Strengths

- The paper attempts to address the physical reasoning limitations of LLMs by making use of physics engines. - The paper provides some concrete examples of tasks and model outputs in the appendix.

Weaknesses

- Overall, the introduced pipeline is very limiting. The framework makes many assumptions, but none are explicitly described in the paper. For example, the scene graph formulation assumes the scene contains mostly distinctive and rigid objects. Since the predictions are simulated with a physical engine, the method also assumes those objects are compatible with the simulator. What constraints are imposed on the acceptable objects in the scene? Moreover, the decision-making procedure requires a fi

Reviewer 03Rating 2Confidence 4

Strengths

1. Interesting high-level motivation. Bridging symbolic reasoning in LLMs with physically grounded modeling is an important and timely goal. 2. Attempt to unify physics reasoning and LLM planning. The modular architecture (graph → simulator → LLM → action) provides a readable system outline.

Weaknesses

1. Conceptual novelty is limited. The core idea—using a physics engine to simulate candidate actions and feeding the results back to an LLM—is conceptually straightforward and has appeared in prior “simulation-in-the-loop” or “world-model prompting” works (e.g., Mind’s Eye, PiLoT, PhysVLM). The proposed Perception–Graph–Language–Physics–Action paradigm is mostly a re-labeling of existing perception-simulation-planning loops in robotics; there is no theoretical or algorithmic advance beyond modul

Reviewer 04Rating 4Confidence 4

Strengths

1. Introduces a novel graph-simulation-loop architecture to help LLMs in physical reasoning. 2. Evaluation across three diverse benchmarks with strong baselines and ablations. 3. Well-organized and accessible, with clear explanations of both high-level ideas and implementation details.

Weaknesses

1. The framework's applicability is currently limited to motion-related problems, particularly collision prediction. This narrow scope and its reliance on a specific simulator make it brittle; adapting it to new tasks or physical phenomena likely requires significant re-implementation. 2. To demonstrate generalizability, future work should include experiments in more complex settings, such as those with cluttered scenes or deformable objects. 3. Although APEX's modularity is a strength, the pape

Code & Models

Repositories

hwj20/apex_exp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Scientific Computing and Data Management · Cloud Computing and Resource Management

MethodsCausal inference