Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation
Jianliang He, Han Zhong, Zhuoran Yang

TL;DR
This paper introduces a new algorithmic framework called LOOP for learning infinite-horizon average-reward MDPs with general function approximation, providing a unified theoretical analysis and regret bounds across various models.
Contribution
It proposes the LOOP algorithm and the AGEC complexity measure, unifying analysis for nearly all AMDP models with theoretical regret guarantees.
Findings
LOOP achieves sublinear regret bounds in general AMDP settings.
The AGEC complexity measure captures exploration challenges across diverse AMDP models.
The framework encompasses linear, kernel, and Bellman eluder dimension-based AMDPs.
Abstract
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. Specifically, we propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP), which incorporates both model-based and value-based incarnations. In particular, LOOP features a novel construction of confidence sets and a low-switching policy updating scheme, which are tailored to the average-reward and function approximation setting. Moreover, for AMDPs, we propose a novel complexity measure -- average-reward generalized eluder coefficient (AGEC) -- which captures the challenge of exploration in AMDPs with general function approximation. Such a complexity measure encompasses almost all previously known tractable AMDP models, such as linear AMDPs and linear mixture AMDPs, and also includes newly identified cases such as kernel AMDPs and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Water Systems and Optimization · Data Stream Mining Techniques
