Non-stationary Reinforcement Learning under General Function Approximation
Songtao Feng, Ming Yin, Ruiquan Huang, Yu-Xiang Wang, Jing Yang,, Yingbin Liang

TL;DR
This paper introduces a new complexity metric and a model-free algorithm for non-stationary reinforcement learning with general function approximation, providing the first dynamic regret analysis in this setting.
Contribution
It proposes the dynamic Bellman Eluder dimension and a confidence-set based algorithm, SW-OPEA, for non-stationary MDPs with general function approximation.
Findings
SW-OPEA achieves sublinear dynamic regret under small variation budgets.
The dynamic Bellman Eluder dimension unifies analysis for static and non-stationary MDPs.
The algorithm outperforms existing UCB-type algorithms in low variation scenarios.
Abstract
General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Evolutionary Algorithms and Applications
