Beyond Importance Sampling: Rejection-Gated Policy Optimization

Ziwu Sun; Zhen Gao; Jiyong Zhang; Jiaheng Li

arXiv:2604.14895·cs.LG·April 17, 2026

Beyond Importance Sampling: Rejection-Gated Policy Optimization

Ziwu Sun, Zhen Gao, Jiyong Zhang, Jiaheng Li

PDF

TL;DR

This paper introduces Rejection-Gated Policy Optimization (RGPO), a new method that selectively trusts samples for policy updates, improving stability and performance in reinforcement learning.

Contribution

RGPO replaces importance sampling ratios with a differentiable acceptance gate, unifies existing policy gradient methods, and guarantees bounded variance and bias.

Findings

01

RGPO guarantees finite, bounded gradient variance with heavy-tailed importance ratios.

02

RGPO incurs only bounded, controllable bias and offers an approximate monotonic policy improvement.

03

In experiments, RGPO outperforms PPO in reward and KL divergence metrics.

Abstract

We propose a new perspective on policy optimization: rather than reweighting all samples by their importance ratios, an optimizer should select which samples are trustworthy enough to drive a policy update. Building on this view, we introduce Rejection-Gated Policy Optimization (RGPO), which replaces the importance sampling ratio r_theta = pi_theta / pi_old with a smooth, differentiable acceptance gate alpha_theta(s, a) = g(r_theta(s, a)) in the range [0, 1]. Unlike prior work that applies rejection sampling as a data-level heuristic before training, RGPO elevates rejection to an optimization principle: the gate participates directly in gradient computation and is implicitly updated alongside the policy. RGPO provides a unified framework: the policy gradients of TRPO, PPO, and REINFORCE all correspond to specific choices of the effective gradient weight w(r) = g'(r) * r. We prove that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.