Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making
Haldun Balim, Na Li, Yilun Du

TL;DR
This paper introduces MPDiffuser, a diffusion-based framework that combines planning and dynamics models to generate feasible, task-aligned trajectories for offline decision-making, improving efficiency and adaptability.
Contribution
The paper presents a novel compositional diffusion framework that integrates a diffusion planner with a dynamics diffusion model for better trajectory generation.
Findings
Improves sample efficiency over prior methods
Achieves better task alignment and feasibility
Successfully deployed on a real quadrupedal robot
Abstract
Offline decision-making via diffusion models often produces trajectories that are misaligned with system dynamics, limiting their reliability for control. We propose Model Predictive Diffuser (MPDiffuser), a compositional diffusion framework that combines a diffusion planner with a dynamics diffusion model to generate task-aligned and dynamically plausible trajectories. MPDiffuser interleaves planner and dynamics updates during sampling, progressively correcting feasibility while preserving task intent. A lightweight ranking module then selects trajectories that best satisfy task objectives. The compositional design improves sample efficiency and adaptability by enabling the dynamics model to leverage diverse and previously unseen data independently of the planner. Empirically, we demonstrate consistent improvements over prior diffusion-based methods on unconstrained (D4RL) and…
Peer Reviews
Decision·Submitted to ICLR 2026
Addresses a clear weakness of current decision diffusers Most diffusion-based planners ignore system dynamics and often generate trajectories that are not physically realizable. Alternating between planning and dynamics correction feels like a natural but powerful extension. Elegant modular design The architecture splits the problem into three parts: planner, dynamics module, and a ranker. Each is separately trained and conceptually clear, which improves interpretability and reproducibility. T
Still limited to state-based tasks — no vision input All experiments assume full-state observations. It is unclear if the method can scale to high-dimensional visual inputs or work jointly with latent diffusion policies. No direct comparison to strong model-based RL or world models (Dreamer, TD-MPC2) Since this is a model-based method, I expected comparisons to world-model-based planners, not just decision diffusers. Scalability and inference cost not fully discussed Alternating between two di
1. The overall motivation is clear, which aims for planning and dynamics consistency is an important goal, especially in offline RL settings. 2. The experiments are fairly comprehensive. While some important baselines are missing, the current results are still sufficient to support the main claim that this iterative learning approach can improve policy performance. 3. The overall presentation is clear and easy to follow.
I list both weaknesses and questions together here, since many of them overlap. 1. My main question is whether the method truly achieves dynamics consistency. As I understand it, the dynamics model is also conditioned on the target variables $y$ . This can make the generated trajectories biased toward those targets, effectively learning “goal-conditioned” dynamics rather than the true environment dynamics. Unless the dataset has good coverage over target variables, the learned dynamics may stil
- the paper provide comprehensive comparison between the proposed method with baselines on common benchmarks with application on real world robots. - by learning a dynamics model, the proposed method can better utilize the low quality data to further improve the sample efficiency and generation quality
- although author provides comprehensive study on baselines like d-mpc and decision diffuser, further clarification on why proposed mdpdiffuser is better than others method in terms of dynamical feasibility is still unclear. assuming all methods learns a correct dynamical model, then all generated trajectory should be feasible. - the role of ranker module and its contribution to final performance is missing. It would be helpful to comment on ranker module’s difference and connections with o
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference
