On Globally Optimal Stochastic Policy Gradient Methods for Domain Randomized LQR Synthesis

Alex Nguyen-Le; Nikolai Matni

arXiv:2603.14197·eess.SY·March 17, 2026

On Globally Optimal Stochastic Policy Gradient Methods for Domain Randomized LQR Synthesis

Alex Nguyen-Le, Nikolai Matni

PDF

Open Access

TL;DR

This paper develops a theoretically grounded stochastic policy gradient method for domain randomized LQR synthesis, demonstrating convergence to global optima and improved controller robustness through repeated system sampling.

Contribution

It introduces a stochastic gradient descent approach for domain randomized LQR that guarantees convergence and enhances controller performance by leveraging repeated system sampling.

Findings

01

Gradients converge to global optima with proper hyperparameters.

02

Resampling systems at each iteration improves controller robustness.

03

Sampling is computationally efficient and yields lower variability.

Abstract

Domain randomization is a simple, effective, and flexible scheme for obtaining robust feedback policies aimed at reducing the sim-to-real gap due to model mismatch. While domain randomization methods have yielded impressive demonstrations in the robotics-learning literature, general and theoretically motivated principles for designing optimization schemes that effectively leverage the randomization are largely unexplored. We address this gap by considering a stochastic policy gradient descent method for the domain randomized linear-quadratic regulator synthesis problem, a situation simple enough to provide theoretical guarantees. In particular, we demonstrate that stochastic gradients obtained by repeatedly sampling new systems at each gradient step converge to global optima with appropriate hyperparameters choices, and yield better controllers with lower variability in the final…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Robot Manipulation and Learning