Sample-efficient Learning of Infinite-horizon Average-reward MDPs with   General Function Approximation

Jianliang He; Han Zhong; Zhuoran Yang

arXiv:2404.12648·cs.LG·April 22, 2024

Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation

Jianliang He, Han Zhong, Zhuoran Yang

PDF

Open Access

TL;DR

This paper introduces a new algorithmic framework called LOOP for learning infinite-horizon average-reward MDPs with general function approximation, providing a unified theoretical analysis and regret bounds across various models.

Contribution

It proposes the LOOP algorithm and the AGEC complexity measure, unifying analysis for nearly all AMDP models with theoretical regret guarantees.

Findings

01

LOOP achieves sublinear regret bounds in general AMDP settings.

02

The AGEC complexity measure captures exploration challenges across diverse AMDP models.

03

The framework encompasses linear, kernel, and Bellman eluder dimension-based AMDPs.

Abstract

We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. Specifically, we propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP), which incorporates both model-based and value-based incarnations. In particular, LOOP features a novel construction of confidence sets and a low-switching policy updating scheme, which are tailored to the average-reward and function approximation setting. Moreover, for AMDPs, we propose a novel complexity measure -- average-reward generalized eluder coefficient (AGEC) -- which captures the challenge of exploration in AMDPs with general function approximation. Such a complexity measure encompasses almost all previously known tractable AMDP models, such as linear AMDPs and linear mixture AMDPs, and also includes newly identified cases such as kernel AMDPs and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Water Systems and Optimization · Data Stream Mining Techniques