Theoretically-Grounded Policy Advice from Multiple Teachers in   Reinforcement Learning Settings with Applications to Negative Transfer

Yusen Zhan; Haitham Bou Ammar; Matthew E. taylor

arXiv:1604.03986·cs.LG·April 15, 2016·32 cites

Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer

Yusen Zhan, Haitham Bou Ammar, Matthew E. taylor

PDF

Open Access

TL;DR

This paper introduces a formal framework for multiple teacher advice in reinforcement learning, providing theoretical guarantees and analyzing conditions under which negative transfer occurs, thereby advancing transfer learning understanding.

Contribution

It formalizes multi-teacher advice in RL, proposes an algorithm combining exploration and advice, and analyzes negative transfer with regret bounds.

Findings

01

Good teachers improve learning speed

02

Bad teachers can hinder learning

03

Negative transfer can be formally characterized

Abstract

Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher's advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research