O-MAPL: Offline Multi-agent Preference Learning
The Viet Bui, Tien Mai, Hong Thanh Nguyen

TL;DR
This paper introduces an end-to-end preference-based learning framework for cooperative multi-agent reinforcement learning, improving training efficiency and outperforming existing methods on benchmark tasks.
Contribution
It proposes a novel joint reward and policy learning method for MARL that leverages soft Q-functions and value decomposition, addressing instability issues in prior approaches.
Findings
Outperforms existing methods on SMAC and MAMuJoCo benchmarks.
Demonstrates improved training stability and efficiency.
Effectively infers reward functions from preferences in multi-agent settings.
Abstract
Inferring reward functions from demonstrations is a key challenge in reinforcement learning (RL), particularly in multi-agent RL (MARL), where large joint state-action spaces and complex inter-agent interactions complicate the task. While prior single-agent studies have explored recovering reward functions and policies from human preferences, similar work in MARL is limited. Existing methods often involve separate stages of supervised reward learning and MARL algorithms, leading to unstable training. In this work, we introduce a novel end-to-end preference-based learning framework for cooperative MARL, leveraging the underlying connection between reward functions and soft Q-functions. Our approach uses a carefully-designed multi-agent value decomposition strategy to improve training efficiency. Extensive experiments on SMAC and MAMuJoCo benchmarks show that our algorithm outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Rough Sets and Fuzzy Logic · Semantic Web and Ontologies
