Leveraging Human Feedback for Semantically-Relevant Skill Discovery
Maxence Hussonnois, Thommen George Karimpanal, Santu Rana

TL;DR
This paper introduces SRSD, a human-in-the-loop method that uses semantic labeling and human feedback to efficiently discover diverse, relevant, and safe skills in reinforcement learning environments.
Contribution
It presents a novel semantic labeling approach and SRSD framework that enhance skill discovery by leveraging human feedback more effectively and scaling to complex skill spaces.
Findings
SRSD improves semantic diversity of discovered skills.
It effectively scales to large and varied skill spaces.
SRSD outperforms preference-based methods in relevance and safety.
Abstract
Unsupervised skill discovery in reinforcement learning aims to intrinsically motivate agents to discover diverse and useful behaviours. However, unconstrained approaches can produce unsafe, unethical, or misaligned behaviours. To mitigate these risks and improve the practical desireability of discovered skills, recent work grounds the discovery process by leveraging human preference feedback. However, preference-based approaches are feedback-inefficient and inherently ill-equipped to deal with skill spaces composed of a variety of different skills such as running, jumping, walking, etc. To overcome this limitation, we introduce semantic labelling, a novel and feedback-efficient approach that leverages human cognitive strengths to identify and label semantically meaningful behaviours. Based on semantic labelling, we propose Semantically Relevant Skill Discovery (SRSD), a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
