Online learning with dynamics: A minimax perspective

Kush Bhatia; Karthik Sridharan

arXiv:2012.01705·cs.LG·December 4, 2020

Online learning with dynamics: A minimax perspective

Kush Bhatia, Karthik Sridharan

PDF

Open Access 1 Video

TL;DR

This paper develops a minimax framework for online learning in dynamic, stateful environments, providing bounds on regret that account for policy complexity and environment stability, applicable to various complex settings.

Contribution

It introduces a unifying analysis framework for online learning with dynamics, deriving regret bounds that incorporate policy class complexity and environment stability, including new bounds for non-linear dynamics.

Findings

01

Derived upper bounds on minimax regret for dynamic environments.

02

Established necessary conditions for online learnability in this setting.

03

Extended regret analysis to non-linear dynamics with non-convex losses.

Abstract

We study the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. In each round of the interaction, the learner selects a policy to deploy and incurs a cost that depends on both the chosen policy and current state of the world. The state-evolution dynamics and the costs are allowed to be time-varying, in a possibly adversarial way. In this setting, we study the problem of minimizing policy regret and provide non-constructive upper bounds on the minimax rate for the problem. Our main results provide sufficient conditions for online learnability for this setup with corresponding rates. The rates are characterized by 1) a complexity term capturing the expressiveness of the underlying policy class under the dynamics of state change, and 2) a dynamics stability term measuring the deviation of the instantaneous loss from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Online learning with dynamics: A minimax perspective· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics