Bayesian Policy Reuse

Benjamin Rosman; Majd Hawasly; Subramanian Ramamoorthy

arXiv:1505.00284·cs.AI·December 15, 2015

Bayesian Policy Reuse

Benjamin Rosman, Majd Hawasly, Subramanian Ramamoorthy

PDF

Open Access

TL;DR

This paper introduces a Bayesian approach to policy reuse for autonomous agents, enabling rapid adaptation to new but related tasks by selecting from a library of pre-learned policies, balancing exploration and exploitation.

Contribution

It formalizes the policy reuse problem and proposes an efficient Bayesian optimization algorithm for selecting policies in real-time, reducing computational complexity.

Findings

01

Rapid convergence in simulated domains

02

Effective policy selection balancing exploration and exploitation

03

Applicable to short-duration, interactive tasks

Abstract

A long-lived autonomous agent should be able to respond online to novel instances of tasks from a familiar domain. Acting online requires 'fast' responses, in terms of rapid convergence, especially when the task instance has a short duration, such as in applications involving interactions with humans. These requirements can be problematic for many established methods for learning to act. In domains where the agent knows that the task instance is drawn from a family of related tasks, albeit without access to the label of any given instance, it can choose to act through a process of policy reuse from a library, rather than policy learning from scratch. In policy reuse, the agent has prior knowledge of the class of tasks in the form of a library of policies that were learnt from sample task instances during an offline training phase. We formalise the problem of policy reuse, and present an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms