Collective AI can amplify tiny perturbations into divergent decisions
Hajime Shimao, Warut Khern-am-nuai, Sung Joo Kim

TL;DR
This paper reveals that collective AI systems, such as multi-model deliberations, can amplify tiny initial differences into divergent decisions, challenging assumptions of robustness and predictability.
Contribution
It demonstrates that iterative multi-LLM deliberation can cause instability and divergence even under deterministic conditions, highlighting a stability problem in collective AI.
Findings
Small changes in scenario text can lead to different final decisions.
Deployed API systems show instability even at temperature 0.
Committee architecture influences the degree of divergence.
Abstract
Large language models are increasingly deployed not as single assistants but as committees whose members deliberate and then vote or synthesize a decision. Such systems are often expected to be more robust than individual models. We show that iterative multi-LLM deliberation can instead amplify tiny perturbations into divergent conversational trajectories and different final decisions. In a fully deterministic self-hosted benchmark, exact reruns are identical, yet small meaning-preserving changes to the scenario text still separate over time and often alter the final recommendation. In deployed black-box API systems, nominally identical committee runs likewise remain unstable even at temperature 0, where many users expect near-determinism. Across 12 policy scenarios, these findings indicate that instability in collective AI is not only a consequence of residual platform-side…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
