Sample-Efficient Policy Space Response Oracles with Joint Experience Best Response

Ariyan Bighashdel; Thiago D. Sim\~ao; Frans A. Oliehoek

arXiv:2602.06599·cs.MA·February 9, 2026

Sample-Efficient Policy Space Response Oracles with Joint Experience Best Response

Ariyan Bighashdel, Thiago D. Sim\~ao, Frans A. Oliehoek

PDF

Open Access

TL;DR

This paper introduces Joint Experience Best Response (JBR), a method that enhances sample efficiency in multi-agent reinforcement learning by reusing data for simultaneous best response computation, making PSRO more practical for large-scale environments.

Contribution

The paper proposes JBR, a novel modification to PSRO that reuses joint trajectories for all agents' best responses, reducing environment interactions and improving efficiency.

Findings

01

JBR improves sample efficiency in multi-agent environments.

02

Exploration-Augmented JBR offers the best accuracy-efficiency trade-off.

03

Hybrid BR achieves near-PSRO performance with less data.

Abstract

Multi-agent reinforcement learning (MARL) offers a scalable alternative to exact game-theoretic analysis but suffers from non-stationarity and the need to maintain diverse populations of strategies that capture non-transitive interactions. Policy Space Response Oracles (PSRO) address these issues by iteratively expanding a restricted game with approximate best responses (BRs), yet per-agent BR training makes it prohibitively expensive in many-agent or simulator-expensive settings. We introduce Joint Experience Best Response (JBR), a drop-in modification to PSRO that collects trajectories once under the current meta-strategy profile and reuses this joint dataset to compute BRs for all agents simultaneously. This amortizes environment interaction and improves the sample efficiency of best-response computation. Because JBR converts BR computation into an offline RL problem, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control