FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
Fanxiao Li, Jiaying Wu, Tingchao Fu, Natasha Jaques, Wei Zhou, Min-Yen Kan

TL;DR
This paper introduces FlowSteer, a prompt-based attack method that exposes vulnerabilities in multi-agent LLM systems' workflow formation, demonstrating how malicious signals can be propagated and mitigated.
Contribution
It reveals workflow formation as a new security vulnerability in multi-agent LLM systems and proposes FlowSteer and FlowGuard as attack and defense mechanisms.
Findings
FlowSteer increases malicious success by up to 55%.
FlowGuard reduces malicious success by up to 34%.
FlowSteer transfers across different MAS setups and remains effective with black-box inference.
Abstract
Multi-agent systems (MAS) powered by large language models (LLMs) increasingly adopt planner--executor architectures, where planners convert prompts into subtasks, roles, dependencies, and routing paths. This flexibility enables adaptive coordination, but exposes an attack surface in workflow formation: prompts can shape agent organization without modifying MAS infrastructure. We study this risk through social influence probing workflows to identify high-impact subtasks and malicious-signal propagation. The analysis reveals two vulnerabilities: workflow position can amplify or suppress a malicious signal, and sycophantic framing makes downstream agents more likely to relay it. We translate these findings into FlowSteer, a prompt-only workflow steering attack that converts vulnerability priors into one crafted prompt. FlowSteer aligns a malicious signal with influential task components…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
