Indirect reciprocity beyond pairwise interactions
Ming Wei, Xin Wang, Junyu Lu, Longzhao Liu, Yishen Jiang, Hongwei Zheng, Shaoting Tang, Feng Fu

TL;DR
This paper develops a framework for understanding how reputation-based cooperation functions in groups, revealing a simple rule and complex dynamics like bistability, with implications for AI alignment.
Contribution
It introduces a general multiplayer indirect reciprocity model, extending classical norms to group settings and analyzing AI language models' social assessment behaviors.
Findings
Stable cooperation follows the 'all good, help; one bad, halt' rule.
Group structure causes bistability and hysteresis in reputation dynamics.
Large language models tend to be punitive but do not fully adopt the core norm.
Abstract
Cooperation in groups underpins collective responses to challenges from climate governance to public goods provision, yet how moral evaluation sustains it remains poorly understood. Indirect reciprocity -- cooperating to build a good reputation -- is well characterized for pairwise interactions, but real collective action requires individuals to be judged against the reputational profile of an entire group. Here we develop a general framework for multiplayer indirect reciprocity and show that stable group cooperation obeys a simple organizing principle: `all good, help; one bad, halt'. This rule is both necessary and sufficient for cooperation to emerge, and it recovers the classical leading eight norms in the pairwise limit. We further show that group structure fundamentally changes reputation dynamics: unlike pairwise models, which are monostable, multiplayer systems exhibit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
