Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

Mingfei Sun

arXiv:2605.18591·cs.LG·May 19, 2026

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

Mingfei Sun

PDF

TL;DR

RAT introduces an efficient method for computing natural policy gradients through direct backpropagation, avoiding complex Fisher matrix calculations, and demonstrates superior empirical performance in control tasks.

Contribution

The paper proposes RAT, a novel approach that reformulates natural policy gradients for efficient computation using randomized backpropagation, with theoretical guarantees and broad applicability.

Findings

01

RAT matches or exceeds existing natural-gradient methods in benchmarks.

02

It avoids explicit Fisher matrix construction and complex solvers.

03

The method is simple to implement and compatible with various architectures.

Abstract

Natural policy gradients improve optimization by accounting for the geometry of distribution space, but their practical use is limited by the cost of estimating and inverting the Fisher matrix. We present Randomized Advantage Transformation (RAT), a method for estimating Tikhonov-regularized natural policy gradients via direct backpropagation. By applying the Woodbury formula, we reformulate the regularized natural policy gradients as vanilla policy gradients with a transformed advantage. RAT computes this transformation efficiently via randomized block Kaczmarz iterations on on-policy mini-batches, avoiding explicit Fisher construction, conjugate-gradient solvers, and architecture-specific approximations. We provide convergence guarantees for RAT and demonstrate empirically that it matches or exceeds established natural-gradient methods across continuous and visual control benchmarks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.