Beyond Reward Suppression: Reshaping Steganographic Communication Protocols in MARL via Dynamic Representational Circuit Breaking
Liu Hung Ming

TL;DR
This paper introduces DRCB, a novel architectural defense for decentralized MARL that detects and mitigates steganographic collusion by converting unobservable messages into auditable objects and applying escalating interventions.
Contribution
The paper presents DRCB, a new method utilizing VQ-VAE and statistical monitoring to detect covert agent communication, enabling pre-deployment auditing for autonomous systems.
Findings
DRCB improves observer accuracy from 85.8% to 93.8%.
Reduces volatility in agent behavior by 43%.
Maintains joint reward levels despite detection mechanisms.
Abstract
In decentralized Multi-Agent Reinforcement Learning (MARL), steganographic collusion -- where agents develop private protocols to evade monitoring -- presents a critical AI safety threat. Existing defenses, limited to behavioral or reward layers, fail to detect coordination in latent communication channels. We introduce the Dynamic Representational Circuit Breaker (DRCB), an architectural defense operating at the optimization substrate. Building on the AI Mother Tongue (AIM) framework, DRCB utilizes a Vector Quantized Variational Autoencoder (VQ-VAE) bottleneck to convert unobservable messages into auditable statistical objects. DRCB monitors signals including Jensen-Shannon Divergence drift, L2-norm codebook displacement, and Randomized Observer Pool accuracy to compute an EMA-based Collusion Score. Threshold breaches trigger four escalating interventions: dynamic adaptation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Adversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data
