Talk, Judge, Cooperate: Gossip-Driven Indirect Reciprocity in Self-Interested LLM Agents

Shuhui Zhu; Yue Lin; Shriya Kaistha; Wenhao Li; Baoxiang Wang; Hongyuan Zha; Gillian K. Hadfield; Pascal Poupart

arXiv:2602.07777·cs.MA·May 20, 2026

Talk, Judge, Cooperate: Gossip-Driven Indirect Reciprocity in Self-Interested LLM Agents

Shuhui Zhu, Yue Lin, Shriya Kaistha, Wenhao Li, Baoxiang Wang, Hongyuan Zha, Gillian K. Hadfield, Pascal Poupart

PDF

1 Repo 3 Reviews

TL;DR

This paper introduces ALIGN, a framework enabling decentralized LLM agents to share gossip, build reputations, and foster cooperation, thereby improving indirect reciprocity and social cohesion.

Contribution

The paper presents ALIGN, a novel gossip-based system that enhances reputation formation and cooperation among self-interested LLM agents in decentralized settings.

Findings

01

ALIGN improves indirect reciprocity among LLM agents.

02

Stronger reasoning in LLMs promotes incentive-aligned cooperation.

03

Chat models tend to over-cooperate even when it's strategically suboptimal.

Abstract

Indirect reciprocity, which means helping those who have helped others, is difficult to sustain among decentralized, self-interested LLM agents without reliable reputation systems. We address this challenge with the Agentic Linguistic Gossip Network (ALIGN), an automated framework that enables decentralized agents to form reputations, evaluate trustworthiness, and coordinate social norms by strategically sharing open-ended gossip with hierarchical tones. We demonstrate that ALIGN consistently improves indirect reciprocity and resists malicious entrants by identifying and ostracizing defectors. Notably, we find that stronger reasoning capabilities in LLMs lead to more incentive-aligned cooperation, whereas chat models often over-cooperate even when strategically suboptimal. These results suggest that leveraging LLM reasoning through decentralized gossip is a promising path for…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper is well-written. The propositions are easy to follow, the setup of the method/experiments is extremely clear, and the figures are very well done. I think figure 6 could be made larger, but it clearly shows that across models, ALIGN tends to achieve higher discounted returns. 2. The experiments are rigorous in their testing of multiple different open/close source models and chat vs reasoning variants. 3. The paper is well-motivated in that communication between agents as necessary

Weaknesses

Limited Domain - While I find indirect reciprocity games to be an interesting testbed for granularly testing the authors' ideas about gossip, it seems limited given the vast amount of work on mixed competitive-cooperative games. There exist more challenging games, such as Starcraft or Sequential Social Dilemmas (https://arxiv.org/abs/1810.08647), that test cooperation strategies amongst agents in much more complex environments. I am curious as to how the authors would compare and contrast their

Reviewer 02Rating 2Confidence 4

Strengths

This paper extends the reputation mechanisms used in previous social dilemma studies and adapts them to LLM agents. The writing is clear and well organized.

Weaknesses

1. Although the paper extends existing methods, the evaluation settings remain limited to classic matrix games. This makes the conclusions rather narrow. The capabilities of LLM agents would allow the proposed mechanism to be tested in more realistic and complex environments. 2. The experimental results themselves are not particularly novel and do not clearly demonstrate what is unique about gossip as a mechanism. 3. The theoretical model cannot quantitatively capture the decision-making process

Reviewer 03Rating 4Confidence 3

Strengths

1. This paper integrates established theories of indirect reciprocity into LLM-based multi-agent systems, proposing ALIGN—a practical and decentralized framework designed for self-interested agents. 2. The experimental results convincingly demonstrate the effectiveness of ALIGN. A particularly interesting finding, as shown in Table 3, is that some Chat Models achieve positive rewards despite defection being the unique SPE. This divergence from the theoretical equilibrium could serve as a potenti

Weaknesses

1. The prompts used for the LLM agents are not specified. It is unclear how environmental context and gossip messages are structured and presented to the agents in both finite and infinite horizon settings, which hinders reproducibility. 2. The importance of the gossip protocol (Section 4.2) is not sufficiently demonstrated. Although it incorporates five levels of judgment, what ultimately matters remains the binary signal(cooperate/defect). 3. There is no ablation study on the reasoning method

Code & Models

Repositories

shuhui-zhu/ALIGN
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Evolutionary Game Theory and Cooperation · Distributed Control Multi-Agent Systems