Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics
Heng Yang

TL;DR
This paper introduces Tempered Sequential Monte Carlo (TSMC), a novel sampling method for trajectory and policy optimization in differentiable dynamic systems, leveraging annealing and Hamiltonian Monte Carlo for efficient, diverse sampling.
Contribution
The paper develops TSMC, a new annealing-based sampling framework that improves trajectory and policy optimization by maintaining diversity and exploiting gradients in multimodal distributions.
Findings
TSMC effectively samples from complex, multimodal distributions.
It outperforms state-of-the-art baselines in benchmark tasks.
The method is broadly applicable to trajectory and policy optimization.
Abstract
We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal "Boltzmann-tilted" distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we introduce tempered sequential Monte Carlo (TSMC): an annealing scheme that adaptively reweights and resamples particles along a tempering path from a prior to the target distribution, while using Hamiltonian Monte Carlo rejuvenation to maintain diversity and exploit exact gradients obtained by differentiating through trajectory rollouts. For policy optimization, we extend TSMC via (i) a deterministic empirical approximation of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
