Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives
Romain Cosentino, Sarath Shekkizhar, Adam Earle

TL;DR
This paper presents a theoretical analysis of how agent-to-agent interactions with misaligned objectives affect convergence, bias, and robustness in simplified linear regression models, with insights applicable to multi-agent LLM systems.
Contribution
It introduces a novel theoretical framework for analyzing coupled dynamics of transformer-based agents with misaligned objectives in in-context learning.
Findings
Misalignment causes biased equilibrium and residual errors predictable from objective gap.
An adversarial regime allows one agent to reach its goal while biasing the other.
Adaptive helper agents can eliminate convergence plateaus and accelerate learning.
Abstract
We develop and analyze a theoretical framework for agent-to-agent interactions in a simplified in-context linear regression setting. In our model, each agent is instantiated as a single-layer transformer with linear self-attention (LSA) trained to implement gradient-descent-like updates on a quadratic regression objective from in-context examples. We then study the coupled dynamics when two such LSA agents alternately update from each other's outputs under potentially misaligned fixed objectives. Within this framework, we characterize the generation dynamics and show that misalignment leads to a biased equilibrium where neither agent reaches its target, with residual errors predictable from the objective gap and the prompt-induced geometry. We also characterize an adversarial regime where asymmetric convergence is possible: one agent reaches its objective exactly while inducing…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
This paper shows an interesting angle to study agent-to-agent interactions through in-context gradient descents of LSAs. The theoretical results show that misaligned objective correspond to different behavior. The experiment also generalizes it to LLMs (GPT5) that validates the theoretical analysis. The paper is well-written.
* Investigating multi-LLM-agents interactions is an important and emerging problem. Although this paper offers an interesting perspective, it builds on oversimplified settings that is not obvious to generalize easily. I appreciate the authors ackoknledging this in the conclusion: "move beyond controlled linear tasks and examine these mechanisms directly in large-scale LLMs." However, I believe this is should be an important point and worth discussing in more detail. * There is some degree of o
1. The paper is the first to model multi-agent LLM interaction as an alternating in-context gradient optimization system, providing a mathematical formulation of inter-agent updates and a computable basis for analyzing bias propagation and convergence stability. 2. The mathematical derivations are complete and logically clear, with consistent notation, explicit assumptions, and boundary conditions; the appendix provides supplementary derivation details that enhance verifiability. 3. The proposed
1. The experiments are conducted only on a synthetic linear-regression task, without testing agent interaction in realistic language scenarios such as reasoning, writing, or code generation. This limits the explanatory power of the results for real-world multi-agent LLM collaboration. 2. The study relies on the assumption that LLM inference is equivalent to in-context gradient descent; while analytically convenient, this assumption is not strictly true for real LLM reasoning and may weaken the p
First of all, the authors identify an important and timely problem in the field of multi-agent systems (MAS) involving large language models (LLMs). The unpredictability of LLM driven MAS and their occasional under performance compared to single-agent systems highlight the need for a deeper understanding of agent interactions. The paper's focus on characterizing agent-to-agent interactions and the analysis regarding the internal state updates is novel and addresses a gap in the literature. The
The paper discusses white-box attacks but does not delve into potential defenses. It would be beneficial to add a short discussion regarding the strategies for eliminating or mitigating these attacks. Addressing these concerns would provide a more comprehensive understanding of the security implications and offer practical solutions for securing multi-agent systems. At the end of section 3, the suggestion to design a common goal for multi-LLMs is quite intuitive. It would be a lot better if the
1. The paper offers a clean and analytically grounded model of in-context optimization between interacting agents. 2. The theoretical results (Propositions 1–3) are mathematically sound and provide clear geometric intuition for asymmetric convergence. 3. The analysis extends the “transformers-as-optimizers” view to a two-agent setting, which is conceptually novel and well aligned with the learning theory track.
1. The paper’s core theory assumes transformers implement in-context gradient-like updates (the “transformers-as-optimizers” view) and then analyzes coupled update dynamics under that assumption. However, the GPT-5 experiments do not test emergent in-context optimization — they prompt the model with the explicit gradient formula and treat GPT-5 as an arithmetic oracle. This weakens the experimental link to the paper’s foundational claim: the GPT-5 results demonstrate correct formula execution, n
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Explainable Artificial Intelligence (XAI)
