Efficient Skill Discovery via Regret-Aware Optimization
He Zhang, Ming Zhou, Shaopeng Zhai, Ying Sun, Hui Xiong

TL;DR
This paper introduces a regret-aware optimization approach for unsupervised skill discovery in reinforcement learning, improving efficiency and diversity by focusing on upgradable policy strengths and adversarial skill generation.
Contribution
It proposes a novel min-max framework with regret-guided skill generation, enhancing exploration efficiency and diversity in high-dimensional environments.
Findings
Outperforms baseline methods in efficiency and diversity.
Achieves 15% zero-shot improvement in high-dimensional environments.
Effective in environments with varying complexities and dimensions.
Abstract
Unsupervised skill discovery aims to learn diverse and distinguishable behaviors in open-ended reinforcement learning. For existing methods, they focus on improving diversity through pure exploration, mutual information optimization, and learning temporal representation. Despite that they perform well on exploration, they remain limited in terms of efficiency, especially for the high-dimensional situations. In this work, we frame skill discovery as a min-max game of skill generation and policy learning, proposing a regret-aware method on top of temporal representation learning that expands the discovered skill space along the direction of upgradable policy strength. The key insight behind the proposed method is that the skill discovery is adversarial to the policy learning, i.e., skills with weak strength should be further explored while less exploration for the skills with converged…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment · Scheduling and Timetabling Solutions · Higher Education Learning Practices
MethodsFocus
