Cooperation and Exploitation in LLM Policy Synthesis for Sequential Social Dilemmas
V\'ictor Gallego

TL;DR
This paper explores how large language models can generate and refine policies for multi-agent social dilemmas using iterative prompts and feedback, demonstrating that social metrics improve cooperation and coordination.
Contribution
It introduces a framework for LLM-based policy synthesis with feedback engineering, comparing sparse and dense feedback, and shows dense social metrics enhance cooperative strategies in social dilemmas.
Findings
Dense feedback improves cooperation metrics over sparse feedback.
Social metrics guide LLMs toward effective cooperative strategies.
LLMs can be manipulated through adversarial attacks, highlighting safety concerns.
Abstract
We study LLM policy synthesis: using a large language model to iteratively generate programmatic agent policies for multi-agent environments. Rather than training neural policies via reinforcement learning, our framework prompts an LLM to produce Python policy functions, evaluates them in self-play, and refines them using performance feedback across iterations. We investigate feedback engineering (the design of what evaluation information is shown to the LLM during refinement) comparing sparse feedback (scalar reward only) against dense feedback (reward plus social metrics: efficiency, equality, sustainability, peace). Across two canonical Sequential Social Dilemmas (Gathering and Cleanup) and two frontier LLMs (Claude Sonnet 4.6, Gemini 3.1 Pro), dense feedback consistently matches or exceeds sparse feedback on all metrics. The advantage is largest in the Cleanup public goods game,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
