Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Jiashuo Jiang; Yiming Zong; Yinyu Ye

arXiv:2505.12037·cs.LG·May 20, 2025

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Jiashuo Jiang, Yiming Zong, Yinyu Ye

PDF

Open Access

TL;DR

This paper introduces an adaptive LP-based algorithm for reinforcement learning with function approximation, providing tighter instance-dependent guarantees and demonstrating strong empirical performance.

Contribution

It develops a novel LP-based RL algorithm that achieves instance-dependent sample complexity guarantees, improving over worst-case bounds.

Findings

01

Achieves an $ ilde{O}(1/N)$ suboptimality gap with N data points.

02

Outperforms previous $O(1/\sqrt{N})$ guarantees in favorable instances.

03

Shows strong empirical results demonstrating efficiency.

Abstract

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or infinite state-action space. In our work, we consider the RL problems with function approximation and we develop a new algorithm to solve it efficiently. Our algorithm is based on the linear programming (LP) reformulation and it resolves the LP at each iteration improved with new data arrival. Such a resolving scheme enables our algorithm to achieve an instance-dependent sample complexity guarantee, more precisely, when we have $N$ data, the output of our algorithm enjoys an instance-dependent $\tilde{O} (1/ N)$ suboptimality gap. In comparison to the $O (1/ N)$ worst-case guarantee established in the previous literature, our instance-dependent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization