TL;DR
This paper introduces DWPI, a dynamic weight-based algorithm that accurately infers agent preferences from demonstrations in multi-objective reinforcement learning, outperforming existing methods in efficiency and accuracy.
Contribution
The paper presents a novel DWPI algorithm capable of inferring preferences from demonstrations without user interaction, with proven correctness and superior empirical performance.
Findings
DWPI outperforms baseline algorithms in accuracy and efficiency.
The algorithm maintains performance with sub-optimal demonstrations.
No user interaction needed during inference.
Abstract
Many decision-making problems feature multiple objectives where it is not always possible to know the preferences of a human or agent decision-maker for different objectives. However, demonstrated behaviors from the decision-maker are often available. This research proposes a dynamic weight-based preference inference (DWPI) algorithm that can infer the preferences of agents acting in multi-objective decision-making problems from demonstrations. The proposed algorithm is evaluated on three multi-objective Markov decision processes: Deep Sea Treasure, Traffic, and Item Gathering, and is compared to two existing preference inference algorithms. Empirical results demonstrate significant improvements compared to the baseline algorithms, in terms of both time efficiency and inference accuracy. The DWPI algorithm maintains its performance when inferring preferences for sub-optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
