Structured Policy Iteration for Linear Quadratic Regulator
Youngsuk Park, Ryan A. Rossi, Zheng Wen, Gang Wu, Handong Zhao

TL;DR
This paper introduces Structured Policy Iteration (S-PI) for LQR, enabling the derivation of structured policies with sparsity or low-rank properties, improving interpretability and efficiency in both known-model and model-free scenarios.
Contribution
The paper proposes a novel S-PI algorithm for regularized LQR that produces structured policies and extends it to model-free settings with convergence guarantees.
Findings
S-PI efficiently solves regularized LQR with structured policies.
Structured policies improve interpretability and memory efficiency.
Experiments show S-PI balances performance and structure level.
Abstract
Linear quadratic regulator (LQR) is one of the most popular frameworks to tackle continuous Markov decision process tasks. With its fundamental theory and tractable optimal policy, LQR has been revisited and analyzed in recent years, in terms of reinforcement learning scenarios such as the model-free or model-based setting. In this paper, we introduce the \textit{Structured Policy Iteration} (S-PI) for LQR, a method capable of deriving a structured linear policy. Such a structured policy with (block) sparsity or low-rank can have significant advantages over the standard LQR policy: more interpretable, memory-efficient, and well-suited for the distributed setting. In order to derive such a policy, we first cast a regularized LQR problem when the model is known. Then, our Structured Policy Iteration (S-PI) algorithm, which takes a policy evaluation step and a policy improvement step in an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and ELM · Advanced Bandit Algorithms Research
