Learn What Not to Learn: Action Elimination with Deep Reinforcement   Learning

Tom Zahavy; Matan Haroush; Nadav Merlis; Daniel J. Mankowitz; Shie; Mannor

arXiv:1809.02121·cs.LG·February 26, 2019·77 cites

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie, Mannor

PDF

Open Access

TL;DR

This paper introduces AE-DQN, a deep reinforcement learning architecture that improves learning efficiency by eliminating sub-optimal actions using an external elimination signal, leading to faster and more robust performance in complex environments.

Contribution

The paper presents a novel Action-Elimination Deep Q-Network that integrates an Action Elimination Network to filter out invalid actions, enhancing learning speed and robustness.

Findings

01

Significant speedup over vanilla DQN in complex environments

02

Improved robustness in text-based games with many actions

03

Effective action filtering using external elimination signals

Abstract

Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is sometimes easier to learn which actions not to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an Action Elimination Network (AEN) that eliminates sub-optimal actions. The AEN is trained to predict invalid actions, supervised by an external elimination signal provided by the environment. Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques

MethodsQ-Learning · Dense Connections · Convolution · Deep Q-Network