ComSD: Balancing Behavioral Quality and Diversity in Unsupervised Skill Discovery
Xin Liu, Yaran Chen, Dongbin Zhao

TL;DR
This paper introduces ComSD, a novel unsupervised skill discovery method that balances exploration and diversity using contrastive rewards and dynamic weighting, leading to improved downstream task adaptation.
Contribution
ComSD proposes a new intrinsic incentive with a contrastive dynamic reward and a dynamic weighting mechanism to enhance skill diversity and exploration in unsupervised RL.
Findings
Achieves state-of-the-art performance on downstream tasks.
Discovers distinguishable, far-reaching exploration skills.
Effective in complex environments like 2D mazes.
Abstract
This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Unsupervised skill discovery seeks to acquire different useful skills without extrinsic reward via unsupervised Reinforcement Learning (RL), with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced skill discovery methods struggle to well balance state exploration and skill diversity, particularly when the potential skills are rich and hard to discern. In this paper, we propose \textbf{Co}ntrastive dyna\textbf{m}ic \textbf{S}kill \textbf{D}iscovery \textbf{(ComSD)}\footnote{Code and videos: https://github.com/liuxin0824/ComSD} which generates diverse and exploratory unsupervised skills through a novel intrinsic incentive, named contrastive dynamic reward. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · AI-based Problem Solving and Planning
MethodsContrastive Learning
