Learning Expected Reward for Switched Linear Control Systems: A Non-Asymptotic View
Muhammad Abdullah Naeem, Miroslav Pajic

TL;DR
This paper establishes non-asymptotic bounds for learning the expected reward in switched linear dynamical systems under a stationary control policy, based on invariant ergodic measures and ergodic theorems.
Contribution
It introduces a non-asymptotic analysis framework for average reward control in SLDSs using invariant measures and ergodic theory, which was previously lacking.
Findings
Existence of invariant ergodic measure under norm-stability.
Non-asymptotic bounds for reward learning from time-averages.
Validation through two case studies.
Abstract
In this work, we show existence of invariant ergodic measure for switched linear dynamical systems (SLDSs) under a norm-stability assumption of system dynamics in some unbounded subset of . Consequently, given a stationary Markov control policy, we derive non-asymptotic bounds for learning expected reward (w.r.t the invariant ergodic measure our closed-loop system mixes to) from time-averages using Birkhoff's Ergodic Theorem. The presented results provide a foundation for deriving non-asymptotic analysis for average reward-based optimal control of SLDSs. Finally, we illustrate the presented theoretical results in two case-studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Advanced Bandit Algorithms Research
