KFCPO: Kronecker-Factored Approximated Constrained Policy Optimization
Joonyoung Lim, Younghwan Yoo

TL;DR
KFCPO introduces a scalable second-order Safe RL algorithm that efficiently balances reward maximization with safety constraints using Kronecker-Factored Approximate Curvature and adaptive gradient manipulation.
Contribution
It combines K-FAC based natural gradient updates with a novel safety-aware gradient adjustment mechanism for improved Safe RL performance.
Findings
Achieves 10.3% to 50.2% higher average return than baselines.
Effectively balances safety constraints with reward optimization.
Demonstrates superior performance on Safety Gymnasium environments.
Abstract
We propose KFCPO, a novel Safe Reinforcement Learning (Safe RL) algorithm that combines scalable Kronecker-Factored Approximate Curvature (K-FAC) based second-order policy optimization with safety-aware gradient manipulation. KFCPO leverages K-FAC to perform efficient and stable natural gradient updates by approximating the Fisher Information Matrix (FIM) in a layerwise, closed form manner, avoiding iterative approximation overheads. To address the tradeoff between reward maximization and constraint satisfaction, we introduce a margin aware gradient manipulation mechanism that adaptively adjusts the influence of reward and cost gradients based on the agent's proximity to safety boundaries. This method blends gradients using a direction sensitive projection, eliminating harmful interference and avoiding abrupt changes caused by fixed hard thresholds. Additionally, a minibatch level KL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Smart Grid Security and Resilience
