CooT: Learning to Coordinate In-Context with Coordination Transformers

Huai-Chih Wang; Hsiang-Chun Chuang; Hsi-Chun Cheng; Dai-Jie Wu; Shao-Hua Sun

arXiv:2506.23549·cs.AI·May 19, 2026

CooT: Learning to Coordinate In-Context with Coordination Transformers

Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun

PDF

3 Reviews

TL;DR

CooT introduces a novel in-context learning framework for multi-agent coordination that enables rapid, stable adaptation to diverse partners without fine-tuning, outperforming existing methods.

Contribution

The paper presents CooT, a coordination transformer that generalizes across partner behaviors using in-context learning, facilitating real-time adaptation in multi-agent systems.

Findings

01

CooT outperforms population-based and fine-tuning methods on benchmarks.

02

CooT adapts quickly to new partners and remains stable under sudden changes.

03

Human evaluations favor CooT as a collaborative partner.

Abstract

Effective coordination among unfamiliar partners remains a major challenge in multi-agent systems. Existing approaches, such as population-based methods, improve robustness through diversity but often lack mechanisms for efficient adaptation beyond training distribution. Moreover, fine-tuning is impractical in few-shot settings due to its high interaction cost. To address these limitations, we propose CooT, a framework that leverages in-context learning (ICL) for real-time partner adaptation. Unlike prior ICL approaches that focus on task generalization, CooT is designed to generalize across diverse partner behaviors. Trained on trajectories from behavior-preferring agents, it learns to align actions with partner intentions purely through observation. We evaluate CooT on two challenging multi-agent benchmarks: Overcooked and Google Research Football. Results show that CooT consistently…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper tackles an important and underexplored problem of partner-centric coordination rather than task-oriented learning. This focus makes the approach more realistic for real-world multi-agent and human-AI interaction settings, where uncertainty often arises from the partner’s changing behavior rather than from the task itself. 2. CooT achieves few-shot adaptation without gradient updates, leveraging contextual information from recent interactions to adjust its policy online. This enables

Weaknesses

1. While the paper covers most technical aspects, several implementation and evaluation details are not sufficiently explained in either the main text or supplementary material, requiring to refer to prior work. For example, the ZSC-Eval-based evaluation pipeline is only briefly described, with the similarity metric computation and partner selection process largely assumed from the original ZSC-Eval paper. 2. The paper lacks deeper analysis of the evaluation results in Table 1. For instance, it

Reviewer 02Rating 4Confidence 4

Strengths

(1) The concept of in-context adaptation for multi-agent generalization is interesting. The method's relation to LLM is very interesting. (2) The way the experiments are set up makes sense to me

Weaknesses

(1) First of all, ZSC (zero-shot coordination, by Treutlein et al., 2021) and AHT (ad-hoc teamwork, by Stone et al., 2010) are two different things. The paper falsely relates itself to ZSC, which refers to training the same algorithm to always converge to the same convention, rather than to AHT, which generalizes to previously unseen teammates. - Treutlein, J., Dennis, M., Oesterheld, C., & Foerster, J. (2021, July). A new formalism, method and open issues for zero-shot coordination. In Interna

Reviewer 03Rating 2Confidence 5

Strengths

1. The paper is generally well written and easy to follow. The motivation and the problem formulation are clearly stated. The figures are clear, visually consistent, and effectively convey both the architecture and the experimental results. 2. The paper provides an extensive and carefully designed ablation study. The authors include experiments on non-stationary or changing partners, adaptation over multiple interaction episodes. The ablation studies are detailed, demonstrating CooT's capabilit

Weaknesses

1. **Insufficient literature review**: The paper overlooks several highly relevant works in adaptive coordination and partner modeling, such as PACE [1], GSCU [2], and LIAM [3]. These works similarly address how to infer a partner’s latent policy or behavioral intent from interaction history and adapt one’s own policy accordingly. In particular, PACE directly studies peer adaptation with context-aware exploration, which is conceptually very close to the proposed “in-context coordination” problem

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)