Non-stationary Reinforcement Learning under General Function   Approximation

Songtao Feng; Ming Yin; Ruiquan Huang; Yu-Xiang Wang; Jing Yang,; Yingbin Liang

arXiv:2306.00861·cs.LG·June 2, 2023·1 cites

Non-stationary Reinforcement Learning under General Function Approximation

Songtao Feng, Ming Yin, Ruiquan Huang, Yu-Xiang Wang, Jing Yang,, Yingbin Liang

PDF

Open Access

TL;DR

This paper introduces a new complexity metric and a model-free algorithm for non-stationary reinforcement learning with general function approximation, providing the first dynamic regret analysis in this setting.

Contribution

It proposes the dynamic Bellman Eluder dimension and a confidence-set based algorithm, SW-OPEA, for non-stationary MDPs with general function approximation.

Findings

01

SW-OPEA achieves sublinear dynamic regret under small variation budgets.

02

The dynamic Bellman Eluder dimension unifies analysis for static and non-stationary MDPs.

03

The algorithm outperforms existing UCB-type algorithms in low variation scenarios.

Abstract

General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Evolutionary Algorithms and Applications