Secret Collusion among AI Agents: Multi-Agent Deception via Steganography

Sumeet Ramesh Motwani; Mikhail Baranchuk; Martin Strohmeier; Vijay Bolina; Philip H.S. Torr; Lewis Hammond; Christian Schroeder de Witt

arXiv:2402.07510·cs.AI·July 28, 2025·1 cites

Secret Collusion among AI Agents: Multi-Agent Deception via Steganography

Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H.S. Torr, Lewis Hammond, Christian Schroeder de Witt

PDF

Open Access

TL;DR

This paper investigates the potential for AI agents to secretly collude using steganography, formalizes the problem, evaluates current models' capabilities, and proposes mitigation strategies to address security risks.

Contribution

It formalizes the problem of AI agent collusion via steganography, introduces a model evaluation framework, and provides empirical analysis across modern LLMs.

Findings

01

GPT-4 shows increased steganographic capabilities

02

Current models have limited steganographic abilities

03

Monitoring is needed for future model capabilities

Abstract

Recent capability increases in large language models (LLMs) open up applications in which groups of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms

MethodsPosition-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Dropout · Multi-Head Attention