Diversity from Human Feedback
Ren-Jian Wang, Ke Xue, Yutong Wang, Peng Yang, Haobo Fu, Qiang Fu, Chao Qian

TL;DR
This paper introduces DivHF, a method that learns behavior descriptors from human feedback to define diversity measures, improving the alignment with human preferences in optimization tasks.
Contribution
The paper formulates the problem of learning behavior spaces from human feedback and proposes the DivHF method, demonstrating its effectiveness in enhancing diversity aligned with human preferences.
Findings
DivHF produces behavior descriptors more aligned with human preferences.
Integrating DivHF with MAP-Elites increases solution diversity.
DivHF outperforms data-driven approaches without human feedback.
Abstract
Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
