AsymPuzl: An Asymmetric Puzzle for multi-agent cooperation

Xavier Cadet; Edward Koh; Peter Chin

arXiv:2512.03466·cs.MA·December 4, 2025

AsymPuzl: An Asymmetric Puzzle for multi-agent cooperation

Xavier Cadet, Edward Koh, Peter Chin

PDF

Open Access 4 Reviews

TL;DR

AsymPuzl is a minimal two-agent puzzle environment designed to evaluate communication strategies of large language models under information asymmetry, revealing how model capabilities and feedback design influence cooperative problem-solving.

Contribution

Introduces AsymPuzl, a controlled environment for studying multi-agent communication and cooperation in LLMs, highlighting the impact of model strength and feedback mechanisms.

Findings

01

Strong models reliably solve puzzles by sharing complete info in two turns.

02

Weaker models often ignore partner messages or over-correct hypotheses.

03

Feedback design significantly affects success rates, with simple self-feedback improving performance.

Abstract

Large Language Model (LLM) agents are increasingly studied in multi-turn, multi-agent scenarios, yet most existing setups emphasize open-ended role-play rather than controlled evaluation. We introduce AsymPuzl, a minimal but expressive two-agent puzzle environment designed to isolate communication under information asymmetry. Each agent observes complementary but incomplete views of a symbolic puzzle and must exchange messages to solve it cooperatively. Using a diverse set of current-generation and open-source LLMs, we show that (i) strong models such as GPT-5 and Claude-4.0 reliably converge across puzzle sizes on the solution by sharing complete information in two turns, (ii) weaker models often ignore partner messages or over-correct their hypotheses, and (iii) feedback design is non-trivial: simple self-feedback improves success rates, while detailed joint feedback can hurt…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 3

Strengths

1. Asymmetric puzzle solving with communication is an interesting topic for LLMs and would be valuable for future multi-agent system studies. I appreciate the reseach topic. 2. There are experiments across major open and closed LLMs. Also, there are various settings proposed, such as ambiguous / extra clues, significantly enriching the proposed benchmark.

Weaknesses

1. The design of AsymPuzl is not good enough. It is not a setting that naturally requires multi-round communication. In the base configuration, if both agents simply share all their information during the first communication round, the problem essentially degenerates into a single-agent puzzle-solving task. I believe the environment should introduce a more clever form of partial observability so that communication emerges naturally. Currently, adding noise to the communication channel feels like

Reviewer 02Rating 2Confidence 5

Strengths

1. The puzzle task is well-designed, simple, and interpretable. It provides a quantifiable and verifiable objective, making it suitable for controlled evaluation of cooperative reasoning. 2. The environment allows precise manipulation of task difficulty and reasoning requirements through parameters such as puzzle size, feedback type, clue ambiguity, and communication noise. This enables fine-grained analysis of agent cooperation. 3. The authors conduct extensive evaluation across a diverse s

Weaknesses

1. The paper lacks originality. Prior studies have already explored multi-agent cooperation under asymmetric information with a wide range of interaction mechanisms, and this work does not present any clear conceptual or methodological innovation. 2. The main contribution lies in the AsymPuzl environment for studying cooperation under asymmetric information and controlled communication. However, most findings, such as “larger puzzles are more challenging,” offer little new knowledge or insight

Reviewer 03Rating 2Confidence 4

Strengths

- The game has a clear ground truth that allows computing baselines. It can be relatively increased in difficulty by adding more objects. It investigates a setup with distractors, such as including objects in one of the hints that are not in the ground truth.

Weaknesses

- Larger models such as GPT-5 consistently perform near-perfectly in most of the experiments. Since the paper mentions that one of its contributions is providing a testbed for agent coordination, this makes this testbed less likely to be useful for studying advanced models. - The puzzle does not necessarily involve sophisticated levels of multi-turn or strategic coordination and an inherent need for having more than one agent. As the paper mentioned repeatedly, the puzzle can be solved by exch

Reviewer 04Rating 2Confidence 4

Strengths

- The paper is well-structured and clearly written, making it accessible to readers with varying levels of familiarity with multi-agent systems and LLM research. The paper provide sufficient background context and clearly articulate their research questions and methodology. - The AsymPuzl game design is elegant and intuitive, providing a straightforward yet effective framework for measuring cooperation and communication between two LLM agents. The puzzle's simplicity allows for controlled experi

Weaknesses

- From my perspective, the proposed paper does not fit well within the scope of ICLR as it primarily focuses on evaluating the capabilities of existing LLMs through prompting strategies rather than introducing novel methodologies, architectures, or learning algorithms. While the AsymPuzl environment itself is a useful contribution, the paper's core findings are largely empirical observations about how different LLMs perform under various communication conditions. The work would benefit from prop

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Topic Modeling · Multimodal Machine Learning Applications