Structured Policy Iteration for Linear Quadratic Regulator

Youngsuk Park; Ryan A. Rossi; Zheng Wen; Gang Wu; Handong Zhao

arXiv:2007.06202·cs.AI·July 14, 2020·6 cites

Structured Policy Iteration for Linear Quadratic Regulator

Youngsuk Park, Ryan A. Rossi, Zheng Wen, Gang Wu, Handong Zhao

PDF

Open Access 1 Video

TL;DR

This paper introduces Structured Policy Iteration (S-PI) for LQR, enabling the derivation of structured policies with sparsity or low-rank properties, improving interpretability and efficiency in both known-model and model-free scenarios.

Contribution

The paper proposes a novel S-PI algorithm for regularized LQR that produces structured policies and extends it to model-free settings with convergence guarantees.

Findings

01

S-PI efficiently solves regularized LQR with structured policies.

02

Structured policies improve interpretability and memory efficiency.

03

Experiments show S-PI balances performance and structure level.

Abstract

Linear quadratic regulator (LQR) is one of the most popular frameworks to tackle continuous Markov decision process tasks. With its fundamental theory and tractable optimal policy, LQR has been revisited and analyzed in recent years, in terms of reinforcement learning scenarios such as the model-free or model-based setting. In this paper, we introduce the \textit{Structured Policy Iteration} (S-PI) for LQR, a method capable of deriving a structured linear policy. Such a structured policy with (block) sparsity or low-rank can have significant advantages over the standard LQR policy: more interpretable, memory-efficient, and well-suited for the distributed setting. In order to derive such a policy, we first cast a regularized LQR problem when the model is known. Then, our Structured Policy Iteration (S-PI) algorithm, which takes a policy evaluation step and a policy improvement step in an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Structured Policy Iteration for Linear Quadratic Regulator· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and ELM · Advanced Bandit Algorithms Research