Human-in-the-Loop Policy Optimization for Preference-Based   Multi-Objective Reinforcement Learning

Ke Li; Han Guo

arXiv:2401.02160·cs.NE·January 5, 2024·2 cites

Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning

Ke Li, Han Guo

PDF

Open Access

TL;DR

This paper introduces a human-in-the-loop framework for preference-based multi-objective reinforcement learning that adaptively identifies policies of interest without prior preference knowledge, improving decision-making efficiency.

Contribution

It proposes a novel interactive method that learns implicit preferences and guides policy optimization in MORL, reducing workload and noise for decision makers.

Findings

01

Outperforms conventional MORL algorithms without preference information.

02

Achieves better policy relevance in robot control and smart grid management.

03

Effectively learns implicit preferences without prior knowledge.

Abstract

Multi-objective reinforcement learning (MORL) aims to find a set of high-performing and diverse policies that address trade-offs between multiple conflicting objectives. However, in practice, decision makers (DMs) often deploy only one or a limited number of trade-off policies. Providing too many diversified trade-off policies to the DM not only significantly increases their workload but also introduces noise in multi-criterion decision-making. With this in mind, we propose a human-in-the-loop policy optimization framework for preference-based MORL that interactively identifies policies of interest. Our method proactively learns the DM's implicit preference information without requiring any a priori knowledge, which is often unavailable in real-world black-box decision scenarios. The learned preference information is used to progressively guide policy optimization towards policies of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Energy Efficiency and Management

MethodsSparse Evolutionary Training