Model Selection for Average Reward RL with Application to Utility   Maximization in Repeated Games

Alireza Masoumian; James R. Wright

arXiv:2411.06069·cs.LG·November 12, 2024

Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games

Alireza Masoumian, James R. Wright

PDF

Open Access

TL;DR

This paper introduces MRBEAR, an online model selection algorithm for average reward reinforcement learning, demonstrating its efficiency in selecting models with minimal regret and applying it to strategic repeated games against opponents with unknown memory limits.

Contribution

The paper proposes MRBEAR, a novel algorithm for model selection in average reward RL, with proven regret bounds and a practical application to repeated game scenarios.

Findings

01

MRBEAR achieves regret bounds scaling linearly with the number of models M.

02

In repeated games, the algorithm's regret depends on opponent complexity and is nearly optimal.

03

Exponential regret dependence on opponent memory m* is proven to be unavoidable.

Abstract

In standard RL, a learner attempts to learn an optimal policy for a Markov Decision Process whose structure (e.g. state space) is known. In online model selection, a learner attempts to learn an optimal policy for an MDP knowing only that it belongs to one of $M > 1$ model classes of varying complexity. Recent results have shown that this can be feasibly accomplished in episodic online RL. In this work, we propose $MRBEAR$ , an online model selection algorithm for the average reward RL setting. The regret of the algorithm is in $\tilde{O} (M C_{m^{*}}^{2} B_{m^{*}} (T, δ))$ where $C_{m^{*}}$ represents the complexity of the simplest well-specified model class and $B_{m^{*}} (T, δ)$ is its corresponding regret bound. This result shows that in average reward RL, like the episodic online RL, the additional cost of model selection scales only linearly in $M$ , the number…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical and Computational Modeling