Policy iteration using Q-functions: Linear dynamics with multiplicative   noise

Peter Coppens; Panagiotis Patrinos

arXiv:2212.01192·math.OC·December 5, 2022·1 cites

Policy iteration using Q-functions: Linear dynamics with multiplicative noise

Peter Coppens, Panagiotis Patrinos

PDF

Open Access

TL;DR

This paper introduces a data-driven policy iteration method for quadratic regulation in linear systems with multiplicative noise, using Q-functions and least-squares estimation, and compares it with existing approaches.

Contribution

It proposes a novel model-free policy iteration scheme leveraging Q-functions and instrumental variables for systems with multiplicative noise, advancing control in uncertain environments.

Findings

01

The method effectively estimates Q-functions in noisy linear systems.

02

Numerical experiments show competitive performance with model-based and gradient methods.

03

The approach is fully data-driven and does not require system identification.

Abstract

This paper presents a novel model-free and fully data-driven policy iteration scheme for quadratic regulation of linear dynamics with state- and input-multiplicative noise. The implementation is similar to the least-squares temporal difference scheme for Markov decision processes, estimating Q-functions by solving a least-squares problem with instrumental variables. The scheme is compared with a model-based system identification scheme and natural policy gradient through numerical experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics