Optimistic Exploration for Risk-Averse Constrained Reinforcement Learning
James McCarthy, Radu Marinescu, Elizabeth Daly, Ivana Dusparic

TL;DR
This paper introduces ORAC, an exploration-based method for risk-averse constrained reinforcement learning that balances reward maximization with safety constraints by using confidence bounds, leading to better policies in safety-critical tasks.
Contribution
The paper proposes ORAC, a novel exploration strategy that effectively manages risk and safety constraints in reinforcement learning, improving over conservative approaches.
Findings
ORAC prevents convergence to sub-optimal policies.
It significantly improves reward-cost trade-offs.
Effective in continuous control and energy management tasks.
Abstract
Risk-averse Constrained Reinforcement Learning (RaCRL) aims to learn policies that minimise the likelihood of rare and catastrophic constraint violations caused by an environment's inherent randomness. In general, risk-aversion leads to conservative exploration of the environment which typically results in converging to sub-optimal policies that fail to adequately maximise reward or, in some cases, fail to achieve the goal. In this paper, we propose an exploration-based approach for RaCRL called Optimistic Risk-averse Actor Critic (ORAC), which constructs an exploratory policy by maximising a local upper confidence bound of the state-action reward value function whilst minimising a local lower confidence bound of the risk-averse state-action cost value function. Specifically, at each step, the weighting assigned to the cost value is increased or decreased if it exceeds or falls below…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
