# Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement   Learning

**Authors:** Caleb Chuck, Supawit Chockchowwat, Scott Niekum

arXiv: 1906.01408 · 2020-03-04

## TL;DR

This paper introduces HyPE, a hierarchical skill learning algorithm that improves sample efficiency in deep reinforcement learning by discovering objects and testing hypotheses about their controllability from raw pixel data.

## Contribution

The paper presents HyPE, a novel hypothesis-driven approach that enhances exploration and skill discovery in DRL through object-based hypotheses and hierarchical learning.

## Key findings

- HyPE significantly outperforms state-of-the-art methods in sample efficiency.
- HyPE successfully discovers objects and controllability hypotheses from raw pixel data.
- HyPE achieves high scores faster in both robotic and game environments.

## Abstract

Deep reinforcement learning (DRL) is capable of learning high-performing policies on a variety of complex high-dimensional tasks, ranging from video games to robotic manipulation. However, standard DRL methods often suffer from poor sample efficiency, partially because they aim to be entirely problem-agnostic. In this work, we introduce a novel approach to exploration and hierarchical skill learning that derives its sample efficiency from intuitive assumptions it makes about the behavior of objects both in the physical world and simulations which mimic physics. Specifically, we propose the Hypothesis Proposal and Evaluation (HyPE) algorithm, which discovers objects from raw pixel data, generates hypotheses about the controllability of observed changes in object state, and learns a hierarchy of skills to test these hypotheses. We demonstrate that HyPE can dramatically improve the sample efficiency of policy learning in two different domains: a simulated robotic block-pushing domain, and a popular benchmark task: Breakout. In these domains, HyPE learns high-scoring policies an order of magnitude faster than several state-of-the-art reinforcement learning methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.01408/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1906.01408/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/1906.01408/full.md

---
Source: https://tomesphere.com/paper/1906.01408