Online learning with dynamics: A minimax perspective
Kush Bhatia, Karthik Sridharan

TL;DR
This paper develops a minimax framework for online learning in dynamic, stateful environments, providing bounds on regret that account for policy complexity and environment stability, applicable to various complex settings.
Contribution
It introduces a unifying analysis framework for online learning with dynamics, deriving regret bounds that incorporate policy class complexity and environment stability, including new bounds for non-linear dynamics.
Findings
Derived upper bounds on minimax regret for dynamic environments.
Established necessary conditions for online learnability in this setting.
Extended regret analysis to non-linear dynamics with non-convex losses.
Abstract
We study the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. In each round of the interaction, the learner selects a policy to deploy and incurs a cost that depends on both the chosen policy and current state of the world. The state-evolution dynamics and the costs are allowed to be time-varying, in a possibly adversarial way. In this setting, we study the problem of minimizing policy regret and provide non-constructive upper bounds on the minimax rate for the problem. Our main results provide sufficient conditions for online learnability for this setup with corresponding rates. The rates are characterized by 1) a complexity term capturing the expressiveness of the underlying policy class under the dynamics of state change, and 2) a dynamics stability term measuring the deviation of the instantaneous loss from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
