Leveraging Human Feedback for Semantically-Relevant Skill Discovery

Maxence Hussonnois; Thommen George Karimpanal; Santu Rana

arXiv:2604.24127·cs.LG·April 28, 2026

Leveraging Human Feedback for Semantically-Relevant Skill Discovery

Maxence Hussonnois, Thommen George Karimpanal, Santu Rana

PDF

TL;DR

This paper introduces SRSD, a human-in-the-loop method that uses semantic labeling and human feedback to efficiently discover diverse, relevant, and safe skills in reinforcement learning environments.

Contribution

It presents a novel semantic labeling approach and SRSD framework that enhance skill discovery by leveraging human feedback more effectively and scaling to complex skill spaces.

Findings

01

SRSD improves semantic diversity of discovered skills.

02

It effectively scales to large and varied skill spaces.

03

SRSD outperforms preference-based methods in relevance and safety.

Abstract

Unsupervised skill discovery in reinforcement learning aims to intrinsically motivate agents to discover diverse and useful behaviours. However, unconstrained approaches can produce unsafe, unethical, or misaligned behaviours. To mitigate these risks and improve the practical desireability of discovered skills, recent work grounds the discovery process by leveraging human preference feedback. However, preference-based approaches are feedback-inefficient and inherently ill-equipped to deal with skill spaces composed of a variety of different skills such as running, jumping, walking, etc. To overcome this limitation, we introduce semantic labelling, a novel and feedback-efficient approach that leverages human cognitive strengths to identify and label semantically meaningful behaviours. Based on semantic labelling, we propose Semantically Relevant Skill Discovery (SRSD), a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.