Actor-Critics Can Achieve Optimal Sample Efficiency
Kevin Tan, Wei Fan, Yuting Wei

TL;DR
This paper introduces a novel actor-critic algorithm that achieves near-optimal sample complexity for reinforcement learning with function approximation, addressing key open problems in sample efficiency and offline data utilization.
Contribution
The paper presents a new actor-critic method with provable sample efficiency and extends it to hybrid RL, including offline data initialization and a non-optimistic variant.
Findings
Achieves $O(1/\epsilon^2)$ sample complexity for $\e$-optimal policies.
Provides a non-optimistic actor-critic algorithm with offline data, reducing sample requirements.
Demonstrates improved sample efficiency in hybrid RL settings through theoretical analysis and experiments.
Abstract
Actor-critic algorithms have become a cornerstone in reinforcement learning (RL), leveraging the strengths of both policy-based and value-based methods. Despite recent progress in understanding their statistical efficiency, no existing work has successfully learned an -optimal policy with a sample complexity of trajectories with general function approximation when strategic exploration is necessary. We address this open problem by introducing a novel actor-critic algorithm that attains a sample-complexity of trajectories, and accompanying regret when the Bellman eluder dimension does not increase with at more than a rate. Here, is the critic function class, is the action space, and is the horizon in the finite horizon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhilosophy and History of Science
