Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback

Andi Nika; Debmalya Mandal; Parameswaran Kamalaruban; Adish Singla; and Goran Radanovi\'c

arXiv:2603.28281·cs.LG·April 10, 2026

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback

Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Adish Singla, and Goran Radanovi\'c

PDF

TL;DR

This paper develops robust algorithms for offline multi-agent reinforcement learning from human feedback that can withstand data corruption, providing theoretical guarantees under different coverage assumptions.

Contribution

It introduces the first systematic approach to handle adversarial data corruption in offline MARLHF with provable bounds and algorithms for various coverage settings.

Findings

01

Robust estimator guarantees an $O( ext{epsilon}^{1 - o(1)})$ bound under uniform coverage.

02

Proposed algorithms achieve an $O( ext{sqrt epsilon})$ bound under unilateral coverage.

03

A quasi-polynomial-time algorithm achieves an $O( ext{sqrt epsilon})$ CCE gap in the challenging setting.

Abstract

We consider robustness against data corruption in offline multi-agent reinforcement learning from human feedback (MARLHF) under a strong-contamination model: given a dataset $D$ of trajectory-preference tuples (each preference being an $n$ -dimensional binary label vector representing each of the $n$ agents' preferences), an $ϵ$ -fraction of the samples may be arbitrarily corrupted. We model the problem using the framework of linear Markov games. First, under a uniform coverage assumption - where every policy of interest is sufficiently represented in the clean (prior to corruption) data - we introduce a robust estimator that guarantees an $O (ϵ^{1 - o (1)})$ bound on the Nash equilibrium gap. Next, we move to the more challenging unilateral coverage setting, in which only a Nash equilibrium and its single-player deviations are covered. In this case, our proposed algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.