Efficient Reinforcement Learning from Demonstration Using Local Ensemble and Reparameterization with Split and Merge of Expert Policies
Yu Wang, Fang Liu

TL;DR
This paper introduces LEARN-SAM, a novel reinforcement learning method that effectively utilizes sub-optimal demonstrations through local policy weighting and split-merge mechanisms, improving learning efficiency and robustness.
Contribution
The paper proposes LEARN-SAM, combining a lambda-function for localizing expert policy weights and a split-merge mechanism to selectively use demonstration data, with theoretical guarantees for convergence.
Findings
LEARN-SAM outperforms existing methods in complex control tasks.
The lambda-function effectively localizes useful demonstration parts.
The split-merge mechanism enhances learning speed and robustness.
Abstract
The current work on reinforcement learning (RL) from demonstrations often assumes the demonstrations are samples from an optimal policy, an unrealistic assumption in practice. When demonstrations are generated by sub-optimal policies or have sparse state-action pairs, policy learned from sub-optimal demonstrations may mislead an agent with incorrect or non-local action decisions. We propose a new method called Local Ensemble and Reparameterization with Split and Merge of expert policies (LEARN-SAM) to improve efficiency and make better use of the sub-optimal demonstrations. First, LEARN-SAM employs a new concept, the lambda-function, based on a discrepancy measure between the current state to demonstrated states to "localize" the weights of the expert policies during learning. Second, LEARN-SAM employs a split-and-merge (SAM) mechanism by separating the helpful parts in each expert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Viral Infectious Diseases and Gene Expression in Insects · Evolutionary Algorithms and Applications
