DEFT: Diverse Ensembles for Fast Transfer in Reinforcement Learning
Simeon Adebola, Satvik Sharma, Kaushik Shivakumar

TL;DR
DEFT introduces a novel ensemble-based reinforcement learning method that enhances transferability and efficiency in multimodal environments by training diverse policies and synthesizing them into effective new policies.
Contribution
The paper presents DEFT, a new ensemble approach that encourages policy diversity during training and synthesizes these policies for improved transfer in RL tasks.
Findings
DEFT produces diverse policies effectively capturing environment multimodality.
DEFT converges faster to high rewards compared to baseline methods.
Pretraining with DEFT enhances transfer performance in unseen environments.
Abstract
Deep ensembles have been shown to extend the positive effect seen in typical ensemble learning to neural networks and to reinforcement learning (RL). However, there is still much to be done to improve the efficiency of such ensemble models. In this work, we present Diverse Ensembles for Fast Transfer in RL (DEFT), a new ensemble-based method for reinforcement learning in highly multimodal environments and improved transfer to unseen environments. The algorithm is broken down into two main phases: training of ensemble members, and synthesis (or fine-tuning) of the ensemble members into a policy that works in a new environment. The first phase of the algorithm involves training regular policy gradient or actor-critic agents in parallel but adding a term to the loss that encourages these policies to differ from each other. This causes the individual unimodal agents to explore the space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
MethodsBalanced Selection
