Boosting CVaR Policy Optimization with Quantile Gradients

Yudong Luo; Erick Delage

arXiv:2601.22100·cs.LG·February 6, 2026

Boosting CVaR Policy Optimization with Quantile Gradients

Yudong Luo, Erick Delage

PDF

Open Access

TL;DR

This paper introduces a novel approach to optimize CVaR in policy gradient methods by incorporating quantile gradients, significantly enhancing sample efficiency and outperforming existing methods in risk-averse domains.

Contribution

It proposes augmenting CVaR with an expected quantile term, enabling dynamic programming and better utilization of sampled data without changing the original CVaR objective.

Findings

01

Improved sample efficiency in CVaR policy optimization.

02

Outperforms existing CVaR-PG and other methods in risk-averse tasks.

03

Demonstrates effectiveness in domains with verifiable risk-averse behavior.

Abstract

Optimizing Conditional Value-at-risk (CVaR) using policy gradient (a.k.a CVaR-PG) faces significant challenges of sample inefficiency. This inefficiency stems from the fact that it focuses on tail-end performance and overlooks many sampled trajectories. We address this problem by augmenting CVaR with an expected quantile term. Quantile optimization admits a dynamic programming formulation that leverages all sampled data, thus improves sample efficiency. This does not alter the CVaR objective since CVaR corresponds to the expectation of quantile over the tail. Empirical results in domains with verifiable risk-averse behavior show that our algorithm within the Markovian policy class substantially improves upon CVaR-PG and consistently outperforms other existing methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques