Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games
Gabriele Farina, Tuomas Sandholm

TL;DR
This paper introduces a novel regret-minimization algorithm that operates effectively without requiring a known model of the decision process, enabling applications in unknown environments and adversarial settings.
Contribution
It presents the first regret-minimization algorithm with sublinear regret guarantees that does not assume knowledge of the decision space or payoffs, expanding applicability to unknown and black-box environments.
Findings
Achieves $O(T^{3/4})$ regret with high probability in unknown decision spaces
Outperforms prior algorithms lacking such guarantees in experiments
Applicable to various equilibrium and opponent modeling problems
Abstract
Regret minimization has proved to be a versatile tool for tree-form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the player's decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs -- even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restrictions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used -- for example, those in which only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
