Accelerating Multi-modal LLM Gaming Performance via Input Prediction and Mishit Correction
Ziyang Lin, Zixuan Sun, Sanhorn Chen, Xiaoyang Chen, Roy Zhao

TL;DR
This paper introduces a speculative execution framework with mismatch-aware correction for model-based control in real-time multi-modal LLM gaming, significantly reducing inference latency while maintaining control performance.
Contribution
It proposes a novel speculation-and-correction approach that combines latent-space MPC planning with learned residual correction to improve real-time control efficiency.
Findings
Reduces planning inferences from 500 to 282
Improves end-to-end latency by 25%
Maintains 92.9% of original control performance
Abstract
Real-time sequential control agents are often bottlenecked by inference latency. Even modest per-step planning delays can destabilize control and degrade overall performance. We propose a speculation-and-correction framework that adapts the predict-then-verify philosophy of speculative execution to model-based control with TD-MPC2. At each step, a pretrained world model and latent-space MPC planner generate a short-horizon action queue together with predicted latent rollouts, allowing the agent to execute multiple planned actions without immediate replanning. When a new observation arrives, the system measures the mismatch between the encoded real latent state and the queued predicted latent. For small to moderate mismatch, a lightweight learned corrector applies a residual update to the speculative action, distilled offline from a replanning teacher. For large mismatch, the agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Machine Learning in Healthcare
