Learning to Teach Reinforcement Learning Agents

Anestis Fachantidis; Matthew E. Taylor; and Ioannis Vlahavas

arXiv:1707.09079·cs.AI·December 12, 2017

Learning to Teach Reinforcement Learning Agents

Anestis Fachantidis, Matthew E. Taylor, and Ioannis Vlahavas

PDF

TL;DR

This paper investigates how reinforcement learning teachers can optimally advise heterogeneous students in Pac-Man under advice budgets, introducing a novel RL approach for advice distribution and highlighting the importance of the coefficient of variation.

Contribution

It introduces a new RL algorithm for learning when to advise under budget constraints and emphasizes the significance of the coefficient of variation in advice quality.

Findings

01

CV is a key statistic for selecting advice policies

02

The proposed RL method adapts advice timing to student and task

03

Advice quality depends on teacher performance and variance

Abstract

In this article we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution we formulate the problem as a learning one and propose a novel RL algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.