Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach
Xiaoran Yin, Xu Luo, Hao Wu, Lianli Gao, Jingkuan Song

TL;DR
This paper introduces FPWC, a foresighted planning framework that uses a world model-driven code execution approach to improve mobile device control by enabling better reasoning and decision-making in complex tasks.
Contribution
It presents a novel framework combining structured world models with code execution for foresighted planning in device control tasks.
Findings
Achieves a 44.4% relative improvement in task success rate in simulations.
Outperforms previous reactive policy approaches.
Demonstrates effectiveness on real mobile devices.
Abstract
The automatic control of mobile devices is essential for efficiently performing complex tasks that involve multiple sequential steps. However, these tasks pose significant challenges due to the limited environmental information available at each step, primarily through visual observations. As a result, current approaches, which typically rely on reactive policies, focus solely on immediate observations and often lead to suboptimal decision-making. To address this problem, we propose \textbf{Foresighted Planning with World Model-Driven Code Execution (FPWC)},a framework that prioritizes natural language understanding and structured reasoning to enhance the agent's global understanding of the environment by developing a task-oriented, refinable \emph{world model} at the outset of the task. Foresighted actions are subsequently generated through iterative planning within this world model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAI-based Problem Solving and Planning · Reinforcement Learning in Robotics · Artificial Intelligence in Games
MethodsFocus
