Efficient Reinforcement Learning from Demonstration Using Local Ensemble   and Reparameterization with Split and Merge of Expert Policies

Yu Wang; Fang Liu

arXiv:2205.11019·cs.LG·May 24, 2022

Efficient Reinforcement Learning from Demonstration Using Local Ensemble and Reparameterization with Split and Merge of Expert Policies

Yu Wang, Fang Liu

PDF

Open Access

TL;DR

This paper introduces LEARN-SAM, a novel reinforcement learning method that effectively utilizes sub-optimal demonstrations through local policy weighting and split-merge mechanisms, improving learning efficiency and robustness.

Contribution

The paper proposes LEARN-SAM, combining a lambda-function for localizing expert policy weights and a split-merge mechanism to selectively use demonstration data, with theoretical guarantees for convergence.

Findings

01

LEARN-SAM outperforms existing methods in complex control tasks.

02

The lambda-function effectively localizes useful demonstration parts.

03

The split-merge mechanism enhances learning speed and robustness.

Abstract

The current work on reinforcement learning (RL) from demonstrations often assumes the demonstrations are samples from an optimal policy, an unrealistic assumption in practice. When demonstrations are generated by sub-optimal policies or have sparse state-action pairs, policy learned from sub-optimal demonstrations may mislead an agent with incorrect or non-local action decisions. We propose a new method called Local Ensemble and Reparameterization with Split and Merge of expert policies (LEARN-SAM) to improve efficiency and make better use of the sub-optimal demonstrations. First, LEARN-SAM employs a new concept, the lambda-function, based on a discrepancy measure between the current state to demonstrated states to "localize" the weights of the expert policies during learning. Second, LEARN-SAM employs a split-and-merge (SAM) mechanism by separating the helpful parts in each expert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Viral Infectious Diseases and Gene Expression in Insects · Evolutionary Algorithms and Applications