Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

Zachary C. Lipton; Kamyar Azizzadenesheli; Abhishek Kumar; Lihong Li,; Jianfeng Gao; Li Deng

arXiv:1611.01211·cs.LG·March 15, 2018·49 cites

Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

Zachary C. Lipton, Kamyar Azizzadenesheli, Abhishek Kumar, Lihong Li,, Jianfeng Gao, Li Deng

PDF

Open Access

TL;DR

This paper introduces intrinsic fear, a learned reward shaping method that helps reinforcement learning agents avoid catastrophic states, improving safety and learning speed in complex environments.

Contribution

The paper proposes intrinsic fear, a novel approach that predicts imminent catastrophes and penalizes them, enhancing safety and efficiency in deep reinforcement learning.

Findings

01

Intrinsic fear improves safety by avoiding catastrophic states.

02

Intrinsic fear accelerates learning speed.

03

Intrinsic fear enhances performance on Atari games.

Abstract

Many practical environments contain catastrophic states that an optimal agent would visit infrequently or never. Even on toy problems, Deep Reinforcement Learning (DRL) agents tend to periodically revisit these states upon forgetting their existence under a new policy. We introduce intrinsic fear (IF), a learned reward shaping that guards DRL agents against periodic catastrophes. IF agents possess a fear model trained to predict the probability of imminent catastrophe. This score is then used to penalize the Q-learning objective. Our theoretical analysis bounds the reduction in average return due to learning on the perturbed objective. We also prove robustness to classification errors. As a bonus, IF models tend to learn faster, owing to reward shaping. Experiments demonstrate that intrinsic-fear DQNs solve otherwise pathological environments and improve on several Atari games.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsQ-Learning