Model-Driven Policy Optimization in Differentiable Simulators via Stochastic Exploration
Yuval Aroosh, Ayal Taitler

TL;DR
This paper introduces MDPO, a stochastic exploration framework for differentiable planning that improves optimization in complex nonlinear and hybrid domains by adaptively injecting noise into action spaces.
Contribution
MDPO is a novel method that dynamically adjusts exploration noise based on gradient sensitivity, enhancing optimization in differentiable simulators.
Findings
MDPO outperforms deterministic planning and model-free baselines on benchmark tasks.
Adaptive noise scheduling improves exploration and solution quality.
Analysis shows how exploration varies over time and iterations.
Abstract
Differentiable planning enables gradient-based optimization of decision-making problems by leveraging differentiable models of system dynamics. However, in highly nonlinear and hybrid discrete-continuous domains, the resulting optimization landscapes are often ill-conditioned, with flat regions and sharp transitions that hinder effective optimization. We propose Model-Driven Policy Optimization (MDPO), a framework that introduces stochastic exploration into differentiable planning by injecting noise into the action space during optimization. Leveraging access to the model, MDPO further adapts the noise magnitude based on gradient-derived sensitivity of the trajectory objective, yielding a time-dependent exploration profile. This enables improved exploration of the objective landscape and helps escape poor local optima via dynamic allocation of exploration across timesteps and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
