Global Convergence of Receding-Horizon Policy Search in Learning   Estimator Designs

Xiangyuan Zhang; Saviz Mowlavi; Mouhacine Benosman; Tamer Ba\c{s}ar

arXiv:2309.04831·math.OC·September 12, 2023

Global Convergence of Receding-Horizon Policy Search in Learning Estimator Designs

Xiangyuan Zhang, Saviz Mowlavi, Mouhacine Benosman, Tamer Ba\c{s}ar

PDF

Open Access 1 Repo

TL;DR

This paper introduces the RHPG algorithm, a reinforcement learning method with proven global convergence for designing optimal linear estimators like the Kalman filter, without prior system knowledge.

Contribution

It develops the first PG algorithm with convergence guarantees for Kalman filter design, integrating dynamic programming with policy search to handle non-convexity.

Findings

01

Proves global convergence of RHPG for Kalman filter design.

02

Demonstrates RHPG's effectiveness on a large-scale convection-diffusion model.

03

Provides theoretical analysis of optimization landscape and sample complexity.

Abstract

We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with provable global convergence in learning the optimal linear estimator designs, i.e., the Kalman filter (KF). Notably, the RHPG algorithm does not require any prior knowledge of the system for initialization and does not require the target system to be open-loop stable. The key of RHPG is that we integrate vanilla PG (or any other policy search directions) into a dynamic programming outer loop, which iteratively decomposes the infinite-horizon KF problem that is constrained and non-convex in the policy parameter into a sequence of static estimation problems that are unconstrained and strongly-convex, thus enabling global convergence. We further provide fine-grained analyses of the optimization landscape under RHPG and detail the convergence and sample complexity guarantees of the algorithm.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiangyuan-zhang/learningkf
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research