Efficient Bayesian Policy Reuse with a Scalable Observation Model in   Deep Reinforcement Learning

Jinmei Liu; Zhi Wang; Chunlin Chen; Daoyi Dong

arXiv:2204.07729·cs.LG·July 14, 2023·1 cites

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Jinmei Liu, Zhi Wang, Chunlin Chen, Daoyi Dong

PDF

Open Access

TL;DR

This paper introduces an improved Bayesian policy reuse method in deep reinforcement learning that uses state transition samples for faster inference and a scalable observation model to reduce sample complexity, enabling efficient transfer and continual learning.

Contribution

It proposes a novel observation signal and a scalable observation model for more efficient and generalizable policy transfer in deep reinforcement learning.

Findings

01

Faster task inference using state transition samples.

02

Reduced sample complexity with a scalable observation model.

03

Effective continual learning with the extended model.

Abstract

Bayesian policy reuse (BPR) is a general policy transfer framework for selecting a source policy from an offline library by inferring the task belief based on some observation signals and a trained observation model. In this paper, we propose an improved BPR method to achieve more efficient policy transfer in deep reinforcement learning (DRL). First, most BPR algorithms use the episodic return as the observation signal that contains limited information and cannot be obtained until the end of an episode. Instead, we employ the state transition sample, which is informative and instantaneous, as the observation signal for faster and more accurate task inference. Second, BPR algorithms usually require numerous samples to estimate the probability distribution of the tabular-based observation model, which may be expensive and even infeasible to learn and maintain, especially when using the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Machine Learning and Data Classification