A General Framework for Sequential Decision-Making under Adaptivity Constraints
Nuoya Xiong, Zhaoran Wang, Zhuoran Yang

TL;DR
This paper introduces a unified framework for sequential decision-making under adaptivity constraints, providing algorithms with optimal regret and switching costs applicable to a wide range of reinforcement learning models.
Contribution
It proposes the Eluder Condition class and algorithms for rare policy switch and batch learning constraints, covering many existing models and new classes.
Findings
Achieves $ ilde{O}(rac{1}{ ext{switching cost}})$ switching cost with $ ilde{O}( ext{regret})$ for EC class.
Provides regret bounds of $ ilde{O}( ext{sqrt}(K)+K/B)$ for batch learning.
First work to address these constraints under general function classes.
Abstract
We take the first step in studying general sequential decision-making under two adaptivity constraints: rare policy switch and batch learning. First, we provide a general class called the Eluder Condition class, which includes a wide range of reinforcement learning classes. Then, for the rare policy switch constraint, we provide a generic algorithm to achieve a switching cost with a regret on the EC class. For the batch learning constraint, we provide an algorithm that provides a regret with the number of batches This paper is the first work considering rare policy switch and batch learning under general function classes, which covers nearly all the models studied in the previous works such as tabular MDP (Bai et al. 2019; Zhang et al. 2020), linear MDP (Wang et al. 2021;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
