Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning
Ke Li, Han Guo

TL;DR
This paper introduces a human-in-the-loop framework for preference-based multi-objective reinforcement learning that adaptively identifies policies of interest without prior preference knowledge, improving decision-making efficiency.
Contribution
It proposes a novel interactive method that learns implicit preferences and guides policy optimization in MORL, reducing workload and noise for decision makers.
Findings
Outperforms conventional MORL algorithms without preference information.
Achieves better policy relevance in robot control and smart grid management.
Effectively learns implicit preferences without prior knowledge.
Abstract
Multi-objective reinforcement learning (MORL) aims to find a set of high-performing and diverse policies that address trade-offs between multiple conflicting objectives. However, in practice, decision makers (DMs) often deploy only one or a limited number of trade-off policies. Providing too many diversified trade-off policies to the DM not only significantly increases their workload but also introduces noise in multi-criterion decision-making. With this in mind, we propose a human-in-the-loop policy optimization framework for preference-based MORL that interactively identifies policies of interest. Our method proactively learns the DM's implicit preference information without requiring any a priori knowledge, which is often unavailable in real-world black-box decision scenarios. The learned preference information is used to progressively guide policy optimization towards policies of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Energy Efficiency and Management
MethodsSparse Evolutionary Training
