Inferring Preferences from Demonstrations in Multi-objective   Reinforcement Learning

Junlin Lu; Patrick Mannion; Karl Mason

arXiv:2409.20258·cs.AI·October 1, 2024

Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning

Junlin Lu, Patrick Mannion, Karl Mason

PDF

1 Repo

TL;DR

This paper introduces DWPI, a dynamic weight-based algorithm that accurately infers agent preferences from demonstrations in multi-objective reinforcement learning, outperforming existing methods in efficiency and accuracy.

Contribution

The paper presents a novel DWPI algorithm capable of inferring preferences from demonstrations without user interaction, with proven correctness and superior empirical performance.

Findings

01

DWPI outperforms baseline algorithms in accuracy and efficiency.

02

The algorithm maintains performance with sub-optimal demonstrations.

03

No user interaction needed during inference.

Abstract

Many decision-making problems feature multiple objectives where it is not always possible to know the preferences of a human or agent decision-maker for different objectives. However, demonstrated behaviors from the decision-maker are often available. This research proposes a dynamic weight-based preference inference (DWPI) algorithm that can infer the preferences of agents acting in multi-objective decision-making problems from demonstrations. The proposed algorithm is evaluated on three multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item Gathering, and is compared to two existing preference inference algorithms. Empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time efficiency and inference accuracy. The DWPI algorithm maintains its performance when inferring preferences for sub-optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JLu2022/DWPI
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.