Trajectory First: A Curriculum for Discovering Diverse Policies
Cornelius V. Braun, Sayantan Auddy, Marc Toussaint

TL;DR
This paper introduces a two-stage curriculum using a spline-based trajectory prior to enhance behavioral diversity in reinforcement learning agents, especially in complex tasks like robot manipulation.
Contribution
It proposes a novel curriculum that first generates diverse high-reward trajectories and then distills them into reactive policies, improving diversity and performance.
Findings
The curriculum increases behavioral diversity in learned skills.
It maintains high task performance while enhancing diversity.
The approach provides insights into challenges of diversity-focused training.
Abstract
Being able to solve a task in diverse ways makes agents more robust to task variations and less prone to local optima. In this context, constrained diversity optimization has become a useful reinforcement learning (RL) framework for training a set of diverse agents in parallel. However, existing constrained-diversity RL methods often under-explore in complex tasks such as robot manipulation, resulting in limited behavioral diversity. We address this with a two-stage curriculum that introduces a spline-based trajectory prior as an inductive bias to produce diverse, high-reward behaviors in an initial stage, and then distills these behaviors into reactive, step-wise policies in a second stage. In our empirical evaluation, we provide novel insights into challenges of diversity-targeted training and show that our curriculum increases the diversity of learned skills while maintaining high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
