Boosting CVaR Policy Optimization with Quantile Gradients
Yudong Luo, Erick Delage

TL;DR
This paper introduces a novel approach to optimize CVaR in policy gradient methods by incorporating quantile gradients, significantly enhancing sample efficiency and outperforming existing methods in risk-averse domains.
Contribution
It proposes augmenting CVaR with an expected quantile term, enabling dynamic programming and better utilization of sampled data without changing the original CVaR objective.
Findings
Improved sample efficiency in CVaR policy optimization.
Outperforms existing CVaR-PG and other methods in risk-averse tasks.
Demonstrates effectiveness in domains with verifiable risk-averse behavior.
Abstract
Optimizing Conditional Value-at-risk (CVaR) using policy gradient (a.k.a CVaR-PG) faces significant challenges of sample inefficiency. This inefficiency stems from the fact that it focuses on tail-end performance and overlooks many sampled trajectories. We address this problem by augmenting CVaR with an expected quantile term. Quantile optimization admits a dynamic programming formulation that leverages all sampled data, thus improves sample efficiency. This does not alter the CVaR objective since CVaR corresponds to the expectation of quantile over the tail. Empirical results in domains with verifiable risk-averse behavior show that our algorithm within the Markovian policy class substantially improves upon CVaR-PG and consistently outperforms other existing methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
