Model-Driven Policy Optimization in Differentiable Simulators via Stochastic Exploration

Yuval Aroosh; Ayal Taitler

arXiv:2605.07520·cs.AI·May 11, 2026

Model-Driven Policy Optimization in Differentiable Simulators via Stochastic Exploration

Yuval Aroosh, Ayal Taitler

PDF

TL;DR

This paper introduces MDPO, a stochastic exploration framework for differentiable planning that improves optimization in complex nonlinear and hybrid domains by adaptively injecting noise into action spaces.

Contribution

MDPO is a novel method that dynamically adjusts exploration noise based on gradient sensitivity, enhancing optimization in differentiable simulators.

Findings

01

MDPO outperforms deterministic planning and model-free baselines on benchmark tasks.

02

Adaptive noise scheduling improves exploration and solution quality.

03

Analysis shows how exploration varies over time and iterations.

Abstract

Differentiable planning enables gradient-based optimization of decision-making problems by leveraging differentiable models of system dynamics. However, in highly nonlinear and hybrid discrete-continuous domains, the resulting optimization landscapes are often ill-conditioned, with flat regions and sharp transitions that hinder effective optimization. We propose Model-Driven Policy Optimization (MDPO), a framework that introduces stochastic exploration into differentiable planning by injecting noise into the action space during optimization. Leveraging access to the model, MDPO further adapts the noise magnitude based on gradient-derived sensitivity of the trajectory objective, yielding a time-dependent exploration profile. This enables improved exploration of the objective landscape and helps escape poor local optima via dynamic allocation of exploration across timesteps and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.