# Accelerating Goal-Directed Reinforcement Learning by Model   Characterization

**Authors:** Shoubhik Debnath, Gaurav Sukhatme, Lantao Liu

arXiv: 1901.01977 · 2019-01-09

## TL;DR

This paper introduces a hybrid reinforcement learning approach that combines model approximation from model-free methods with reachability analysis using Mean First Passage Times, leading to more sample-efficient algorithms.

## Contribution

It presents a novel hybrid framework and two algorithms, MFPT-Q and MFPT-DYNA, that improve sample efficiency by integrating model-based and model-free reinforcement learning techniques.

## Key findings

- Faster convergence with fewer iterations compared to state-of-the-art methods.
- Requires fewer samples and training trials to achieve convergence.
- Preliminary results demonstrate improved efficiency of the proposed algorithms.

## Abstract

We propose a hybrid approach aimed at improving the sample efficiency in goal-directed reinforcement learning. We do this via a two-step mechanism where firstly, we approximate a model from Model-Free reinforcement learning. Then, we leverage this approximate model along with a notion of reachability using Mean First Passage Times to perform Model-Based reinforcement learning. Built on such a novel observation, we design two new algorithms - Mean First Passage Time based Q-Learning (MFPT-Q) and Mean First Passage Time based DYNA (MFPT-DYNA), that have been fundamentally modified from the state-of-the-art reinforcement learning techniques. Preliminary results have shown that our hybrid approaches converge with much fewer iterations than their corresponding state-of-the-art counterparts and therefore requiring much fewer samples and much fewer training trials to converge.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.01977/full.md

## Figures

28 figures with captions in the complete paper: https://tomesphere.com/paper/1901.01977/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1901.01977/full.md

---
Source: https://tomesphere.com/paper/1901.01977