Policy iteration using Q-functions: Linear dynamics with multiplicative noise
Peter Coppens, Panagiotis Patrinos

TL;DR
This paper introduces a data-driven policy iteration method for quadratic regulation in linear systems with multiplicative noise, using Q-functions and least-squares estimation, and compares it with existing approaches.
Contribution
It proposes a novel model-free policy iteration scheme leveraging Q-functions and instrumental variables for systems with multiplicative noise, advancing control in uncertain environments.
Findings
The method effectively estimates Q-functions in noisy linear systems.
Numerical experiments show competitive performance with model-based and gradient methods.
The approach is fully data-driven and does not require system identification.
Abstract
This paper presents a novel model-free and fully data-driven policy iteration scheme for quadratic regulation of linear dynamics with state- and input-multiplicative noise. The implementation is similar to the least-squares temporal difference scheme for Markov decision processes, estimating Q-functions by solving a least-squares problem with instrumental variables. The scheme is compared with a model-based system identification scheme and natural policy gradient through numerical experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
