Variance-Reduced Conservative Policy Iteration

Naman Agarwal; Brian Bullins; Karan Singh

arXiv:2212.06283·cs.LG·January 26, 2023

Variance-Reduced Conservative Policy Iteration

Naman Agarwal, Brian Bullins, Karan Singh

PDF

Open Access

TL;DR

This paper introduces a variance-reduced version of Conservative Policy Iteration that significantly improves sample complexity for reinforcement learning, achieving near-optimal results with fewer samples.

Contribution

It proposes a novel variance-reduced algorithm for Conservative Policy Iteration, reducing sample complexity and achieving global optimality under certain assumptions.

Findings

01

Reduces sample complexity from O(ε^{-4}) to O(ε^{-3}) for local optima.

02

Achieves ε-global optimality with O(ε^{-2}) samples under specific assumptions.

03

Improves upon previous methods by leveraging variance reduction techniques.

Abstract

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $ε$ -functional local optimum from $O (ε^{- 4})$ to $O (ε^{- 3})$ . Under state-coverage and policy-completeness assumptions, the algorithm enjoys $ε$ -global optimality after sampling $O (ε^{- 2})$ times, improving upon the previously established $O (ε^{- 3})$ sample requirement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Machine Learning and Algorithms