Adaptive Resolving Methods for Reinforcement Learning with Function Approximations
Jiashuo Jiang, Yiming Zong, Yinyu Ye

TL;DR
This paper introduces an adaptive LP-based algorithm for reinforcement learning with function approximation, providing tighter instance-dependent guarantees and demonstrating strong empirical performance.
Contribution
It develops a novel LP-based RL algorithm that achieves instance-dependent sample complexity guarantees, improving over worst-case bounds.
Findings
Achieves an $ ilde{O}(1/N)$ suboptimality gap with N data points.
Outperforms previous $O(1/\sqrt{N})$ guarantees in favorable instances.
Shows strong empirical results demonstrating efficiency.
Abstract
Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or infinite state-action space. In our work, we consider the RL problems with function approximation and we develop a new algorithm to solve it efficiently. Our algorithm is based on the linear programming (LP) reformulation and it resolves the LP at each iteration improved with new data arrival. Such a resolving scheme enables our algorithm to achieve an instance-dependent sample complexity guarantee, more precisely, when we have data, the output of our algorithm enjoys an instance-dependent suboptimality gap. In comparison to the worst-case guarantee established in the previous literature, our instance-dependent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization
