Variance-Reduced Conservative Policy Iteration
Naman Agarwal, Brian Bullins, Karan Singh

TL;DR
This paper introduces a variance-reduced version of Conservative Policy Iteration that significantly improves sample complexity for reinforcement learning, achieving near-optimal results with fewer samples.
Contribution
It proposes a novel variance-reduced algorithm for Conservative Policy Iteration, reducing sample complexity and achieving global optimality under certain assumptions.
Findings
Reduces sample complexity from O(ε^{-4}) to O(ε^{-3}) for local optima.
Achieves ε-global optimality with O(ε^{-2}) samples under specific assumptions.
Improves upon previous methods by leveraging variance reduction techniques.
Abstract
We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a -functional local optimum from to . Under state-coverage and policy-completeness assumptions, the algorithm enjoys -global optimality after sampling times, improving upon the previously established sample requirement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Machine Learning and Algorithms
