Model-Free Online Learning in Unknown Sequential Decision Making   Problems and Games

Gabriele Farina; Tuomas Sandholm

arXiv:2103.04539·cs.GT·March 9, 2021

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Gabriele Farina, Tuomas Sandholm

PDF

TL;DR

This paper introduces a novel regret-minimization algorithm that operates effectively without requiring a known model of the decision process, enabling applications in unknown environments and adversarial settings.

Contribution

It presents the first regret-minimization algorithm with sublinear regret guarantees that does not assume knowledge of the decision space or payoffs, expanding applicability to unknown and black-box environments.

Findings

01

Achieves $O(T^{3/4})$ regret with high probability in unknown decision spaces

02

Outperforms prior algorithms lacking such guarantees in experiments

03

Applicable to various equilibrium and opponent modeling problems

Abstract

Regret minimization has proved to be a versatile tool for tree-form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the player's decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs -- even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restrictions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used -- for example, those in which only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.