Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems

Moritz Weckbecker; Jonas M\"uller; Ben Hagag; Michael Mulet

arXiv:2603.00131·cs.MA·March 3, 2026

Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems

Moritz Weckbecker, Jonas M\"uller, Ben Hagag, Michael Mulet

PDF

Open Access

TL;DR

This paper demonstrates that subliminal prompting can spread biases across multi-agent systems, potentially degrading their performance and posing security risks, highlighting a new attack vector in AI alignment and security.

Contribution

It reveals how subliminal prompts can propagate biases in multi-agent systems, a previously unexplored security concern with implications for AI safety.

Findings

01

Bias persists and spreads through network topology

02

Subliminal prompting degrades truthfulness in multi-agent interactions

03

The phenomenon poses new security risks in multi-agent AI systems

Abstract

Subliminal prompting is a phenomenon in which language models are biased towards certain concepts or traits through prompting with semantically unrelated tokens. While prior work has examined subliminal prompting in user-LLM interactions, potential bias transfer in multi-agent systems and its associated security implications remain unexplored. In this work, we show that a single subliminally prompted agent can spread a weakening but persisting bias throughout its entire network. We measure this phenomenon across 6 agents using two different topologies, observing that the transferred concept maintains an elevated response rate throughout the network. To exemplify potential misalignment risks, we assess network performance on multiple-choice TruthfulQA, showing that subliminal prompting of a single agent may degrade the truthfulness of other agents. Our findings reveal that subliminal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Authorship Attribution and Profiling · Adversarial Robustness in Machine Learning