Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning
Jinmei Liu, Zhi Wang, Chunlin Chen, Daoyi Dong

TL;DR
This paper introduces an improved Bayesian policy reuse method in deep reinforcement learning that uses state transition samples for faster inference and a scalable observation model to reduce sample complexity, enabling efficient transfer and continual learning.
Contribution
It proposes a novel observation signal and a scalable observation model for more efficient and generalizable policy transfer in deep reinforcement learning.
Findings
Faster task inference using state transition samples.
Reduced sample complexity with a scalable observation model.
Effective continual learning with the extended model.
Abstract
Bayesian policy reuse (BPR) is a general policy transfer framework for selecting a source policy from an offline library by inferring the task belief based on some observation signals and a trained observation model. In this paper, we propose an improved BPR method to achieve more efficient policy transfer in deep reinforcement learning (DRL). First, most BPR algorithms use the episodic return as the observation signal that contains limited information and cannot be obtained until the end of an episode. Instead, we employ the state transition sample, which is informative and instantaneous, as the observation signal for faster and more accurate task inference. Second, BPR algorithms usually require numerous samples to estimate the probability distribution of the tabular-based observation model, which may be expensive and even infeasible to learn and maintain, especially when using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Machine Learning and Data Classification
