Bayesian Policy Reuse
Benjamin Rosman, Majd Hawasly, Subramanian Ramamoorthy

TL;DR
This paper introduces a Bayesian approach to policy reuse for autonomous agents, enabling rapid adaptation to new but related tasks by selecting from a library of pre-learned policies, balancing exploration and exploitation.
Contribution
It formalizes the policy reuse problem and proposes an efficient Bayesian optimization algorithm for selecting policies in real-time, reducing computational complexity.
Findings
Rapid convergence in simulated domains
Effective policy selection balancing exploration and exploitation
Applicable to short-duration, interactive tasks
Abstract
A long-lived autonomous agent should be able to respond online to novel instances of tasks from a familiar domain. Acting online requires 'fast' responses, in terms of rapid convergence, especially when the task instance has a short duration, such as in applications involving interactions with humans. These requirements can be problematic for many established methods for learning to act. In domains where the agent knows that the task instance is drawn from a family of related tasks, albeit without access to the label of any given instance, it can choose to act through a process of policy reuse from a library, rather than policy learning from scratch. In policy reuse, the agent has prior knowledge of the class of tasks in the form of a library of policies that were learnt from sample task instances during an offline training phase. We formalise the problem of policy reuse, and present an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
