Asymptotic Instance-Optimal Algorithms for Interactive Decision Making
Kefan Dong, Tengyu Ma

TL;DR
This paper introduces the first asymptotic instance-optimal algorithm for general interactive decision making, adapting to problem complexity and outperforming all consistent algorithms with regret proportional to the instance complexity.
Contribution
It develops an algorithm that achieves asymptotic instance optimality in interactive decision making, using hypothesis testing and active data collection to adapt to problem complexity.
Findings
Recovers classical gap-dependent bounds for multi-armed bandits.
Improves upon previous instance-dependent bounds for reinforcement learning.
Outperforms all consistent algorithms on every problem instance.
Abstract
Past research on interactive decision making problems (bandits, reinforcement learning, etc.) mostly focuses on the minimax regret that measures the algorithm's performance on the hardest instance. However, an ideal algorithm should adapt to the complexity of a particular problem instance and incur smaller regrets on easy instances than worst-case instances. In this paper, we design the first asymptotic instance-optimal algorithm for general interactive decision making problems with finite number of decisions under mild conditions. On every instance , our algorithm outperforms all consistent algorithms (those achieving non-trivial regrets on all instances), and has asymptotic regret , where is an exact characterization of the complexity of . The key step of the algorithm involves hypothesis testing with active data collection. It computes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Machine Learning and Algorithms
