Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL

Claude Formanek; Omayma Mahjoub; Louay Ben Nessir; Sasha Abramowitz; Ruan de Kock; Wiem Khlifi; Daniel Rajaonarivonivelomanantsoa; Simon Du Toit; Arnol Fokam; Siddarth Singh; Ulrich Mbou Sob; Felix Chalumeau; Arnu Pretorius

arXiv:2505.22151·cs.LG·October 31, 2025

Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL

Claude Formanek, Omayma Mahjoub, Louay Ben Nessir, Sasha Abramowitz, Ruan de Kock, Wiem Khlifi, Daniel Rajaonarivonivelomanantsoa, Simon Du Toit, Arnol Fokam, Siddarth Singh, Ulrich Mbou Sob, Felix Chalumeau, Arnu Pretorius

PDF

1 Video

TL;DR

Oryx is a new offline multi-agent reinforcement learning algorithm that improves coordination in complex environments by combining autoregressive policies with implicit constraint Q-learning, achieving state-of-the-art results across diverse benchmarks.

Contribution

The paper introduces Oryx, a novel offline MARL algorithm that effectively handles long-horizon coordination in many-agent settings using a retention-based architecture and implicit constraint Q-learning.

Findings

01

Achieves state-of-the-art performance on over 80% of tested datasets.

02

Demonstrates robust generalization across domains with many agents.

03

Effectively scales to complex, long-horizon tasks.

Abstract

A key challenge in offline multi-agent reinforcement learning (MARL) is achieving effective many-agent multi-step coordination in complex environments. In this work, we propose Oryx, a novel algorithm for offline cooperative MARL to directly address this challenge. Oryx adapts the recently proposed retention-based architecture Sable and combines it with a sequential form of implicit constraint Q-learning (ICQ), to develop a novel offline autoregressive policy update scheme. This allows Oryx to solve complex coordination challenges while maintaining temporal coherence over long trajectories. We evaluate Oryx across a diverse set of benchmarks from prior works -- SMAC, RWARE, and Multi-Agent MuJoCo -- covering tasks of both discrete and continuous control, varying in scale and difficulty. Oryx achieves state-of-the-art performance on more than 80% of the 65 tested datasets, outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL· slideslive