Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Zhiyuan Hu; Yunhai Hu; Juncheng Liu; Shuyue Stella Li; Yucheng Wang; Zhen Xu; See-Kiong Ng; Anh Tuan Luu; Xinxing Xu; Bryan Hooi; Cynthia Breazeal; Hae Won Park

arXiv:2601.09667·cs.AI·January 16, 2026

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Zhiyuan Hu, Yunhai Hu, Juncheng Liu, Shuyue Stella Li, Yucheng Wang, Zhen Xu, See-Kiong Ng, Anh Tuan Luu, Xinxing Xu, Bryan Hooi, Cynthia Breazeal, Hae Won Park

PDF

Open Access

TL;DR

This paper introduces MATTRL, a test-time reinforcement learning framework for multi-agent systems that enhances reasoning accuracy across various domains by integrating structured experiences during inference, improving robustness and efficiency.

Contribution

The paper proposes a novel test-time reinforcement learning approach for multi-agent systems, enabling stable, distribution-shift-robust reasoning without additional tuning.

Findings

01

MATTRL improves accuracy by 3.67% over multi-agent baselines.

02

MATTRL improves accuracy by 8.67% over single-agent baselines.

03

Ablation studies reveal the impact of different credit-assignment schemes.

Abstract

Multi-agent systems have evolved into practical LLM-driven collaborators for many applications, gaining robustness from diversity and cross-checking. However, multi-agent RL (MARL) training is resource-intensive and unstable: co-adapting teammates induce non-stationarity, and rewards are often sparse and high-variance. Therefore, we introduce \textbf{Multi-Agent Test-Time Reinforcement Learning (MATTRL)}, a framework that injects structured textual experience into multi-agent deliberation at inference time. MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making. We also study credit assignment for constructing a turn-level experience pool, then reinjecting it into the dialogue. Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)