Learning Expected Reward for Switched Linear Control Systems: A   Non-Asymptotic View

Muhammad Abdullah Naeem; Miroslav Pajic

arXiv:2006.08105·math.PR·June 16, 2020

Learning Expected Reward for Switched Linear Control Systems: A Non-Asymptotic View

Muhammad Abdullah Naeem, Miroslav Pajic

PDF

Open Access

TL;DR

This paper establishes non-asymptotic bounds for learning the expected reward in switched linear dynamical systems under a stationary control policy, based on invariant ergodic measures and ergodic theorems.

Contribution

It introduces a non-asymptotic analysis framework for average reward control in SLDSs using invariant measures and ergodic theory, which was previously lacking.

Findings

01

Existence of invariant ergodic measure under norm-stability.

02

Non-asymptotic bounds for reward learning from time-averages.

03

Validation through two case studies.

Abstract

In this work, we show existence of invariant ergodic measure for switched linear dynamical systems (SLDSs) under a norm-stability assumption of system dynamics in some unbounded subset of $R^{n}$ . Consequently, given a stationary Markov control policy, we derive non-asymptotic bounds for learning expected reward (w.r.t the invariant ergodic measure our closed-loop system mixes to) from time-averages using Birkhoff's Ergodic Theorem. The presented results provide a foundation for deriving non-asymptotic analysis for average reward-based optimal control of SLDSs. Finally, we illustrate the presented theoretical results in two case-studies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Advanced Bandit Algorithms Research