Globally Convergent Policy Search over Dynamic Filters for Output Estimation
Jack Umenberger, Max Simchowitz, Juan C. Perdomo, Kaiqing Zhang, Russ, Tedrake

TL;DR
This paper presents a novel policy search algorithm with provable global convergence to the optimal dynamic filter for output prediction in linear dynamical systems, addressing challenges of partial observability and internal state degeneracy.
Contribution
It introduces the concept of informativity and a regularizer to enforce it, enabling gradient descent to find the globally optimal filter with convergence guarantees.
Findings
Proposes a regularizer that enforces informativity in filters.
Establishes convergence of gradient descent to the optimal filter at an O(1/T) rate.
Develops new theoretical tools for analyzing non-convex gradient descent via convex reformulation.
Abstract
We introduce the first direct policy search algorithm which provably converges to the globally optimal filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. Despite the ubiquity of partial observability in practice, theoretical guarantees for direct policy search algorithms, one of the backbones of modern reinforcement learning, have proven difficult to achieve. This is primarily due to the degeneracies which arise when optimizing over filters that maintain internal state. In this paper, we provide a new perspective on this challenging problem based on the notion of , which intuitively requires that all components of a filter's internal state are representative of the true state of the underlying dynamical system. We show that informativity overcomes the aforementioned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
