MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations

The Viet Bui; Tien Mai; Hong Thanh Nguyen

arXiv:2505.18595·cs.LG·May 27, 2025

MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations

The Viet Bui, Tien Mai, Hong Thanh Nguyen

PDF

Open Access

TL;DR

MisoDICE introduces a two-stage approach for offline multi-agent imitation learning from unlabeled mixed-quality demonstrations, combining trajectory labeling with a novel multi-agent IL algorithm to improve policy robustness.

Contribution

The paper proposes MisoDICE, a new multi-agent imitation learning algorithm that extends DICE with value decomposition, enabling effective learning from unlabeled mixed-quality data.

Findings

01

MisoDICE outperforms existing methods on standard benchmarks.

02

Effective trajectory labeling improves imitation quality.

03

Robust policies are learned even with scarce expert data.

Abstract

We study offline imitation learning (IL) in cooperative multi-agent settings, where demonstrations have unlabeled mixed quality - containing both expert and suboptimal trajectories. Our proposed solution is structured in two stages: trajectory labeling and multi-agent imitation learning, designed jointly to enable effective learning from heterogeneous, unlabeled data. In the first stage, we combine advances in large language models and preference-based reinforcement learning to construct a progressive labeling pipeline that distinguishes expert-quality trajectories. In the second stage, we introduce MisoDICE, a novel multi-agent IL algorithm that leverages these labels to learn robust policies while addressing the computational complexity of large joint state-action spaces. By extending the popular single-agent DICE framework to multi-agent settings with a new value decomposition and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Machine Learning and Algorithms · Hate Speech and Cyberbullying Detection