# Improving interactive reinforcement learning: What makes a good teacher?

**Authors:** Francisco Cruz, Sven Magg, Yukie Nagai, Stefan Wermter

arXiv: 1904.06879 · 2019-04-16

## TL;DR

This paper investigates how different artificial trainer agents and system parameters influence the effectiveness of interactive reinforcement learning, aiming to identify qualities that make a good teacher for faster and more stable learning.

## Contribution

It analyzes internal representations of artificial agents to determine which agent types serve as better trainers, highlighting the benefits of polymath agents over specialist agents.

## Key findings

- Polymath agents as trainers lead to higher rewards and faster convergence.
- Advisor agents improve stability in state visit frequencies.
- Feedback consistency significantly impacts learning with varying obedience parameters.

## Abstract

Interactive reinforcement learning has become an important apprenticeship approach to speed up convergence in classic reinforcement learning problems. In this regard, a variant of interactive reinforcement learning is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using reinforcement learning methods to afterward becoming an advisor for other learner-agents. In this work, we analyze internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behavior in terms of the state visit frequency of the learner-agents. Moreover, we analyze system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.06879/full.md

## Figures

43 figures with captions in the complete paper: https://tomesphere.com/paper/1904.06879/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1904.06879/full.md

---
Source: https://tomesphere.com/paper/1904.06879