# Ranking Policy Gradient

**Authors:** Kaixiang Lin, Jiayu Zhou

arXiv: 1906.09674 · 2019-11-27

## TL;DR

This paper introduces Ranking Policy Gradient (RPG), a novel off-policy reinforcement learning method that efficiently learns optimal action rankings, significantly reducing sample complexity and improving scalability for large problems.

## Contribution

The paper proposes RPG, a new policy gradient approach that learns action rankings, with theoretical guarantees and practical benefits over existing methods.

## Key findings

- RPG reduces sample complexity compared to state-of-the-art methods.
- The sample complexity of RPG is independent of state space dimension.
- Extensive experiments demonstrate RPG's effectiveness in large-scale problems.

## Abstract

Sample inefficiency is a long-lasting problem in reinforcement learning (RL). The state-of-the-art estimates the optimal action values while it usually involves an extensive search over the state-action space and unstable optimization. Towards the sample-efficient RL, we propose ranking policy gradient (RPG), a policy gradient method that learns the optimal rank of a set of discrete actions. To accelerate the learning of policy gradient methods, we establish the equivalence between maximizing the lower bound of return and imitating a near-optimal policy without accessing any oracles. These results lead to a general off-policy learning framework, which preserves the optimality, reduces variance, and improves the sample-efficiency. Furthermore, the sample complexity of RPG does not depend on the dimension of state space, which enables RPG for large-scale problems. We conduct extensive experiments showing that when consolidating with the off-policy learning framework, RPG substantially reduces the sample complexity, comparing to the state-of-the-art.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.09674/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1906.09674/full.md

## References

68 references — full list in the complete paper: https://tomesphere.com/paper/1906.09674/full.md

---
Source: https://tomesphere.com/paper/1906.09674