Semantic Denial of Service in LLM-controlled robots
Jonathan Steinberg, Oren Gal

TL;DR
This paper demonstrates that safety-oriented instruction-following in LLM-controlled robots can be exploited to cause denial-of-service attacks through minimal audio injections, revealing a significant security vulnerability.
Contribution
It uncovers a novel semantic denial-of-service attack on LLM-controlled robots using audio injections and evaluates defenses, highlighting architectural security issues.
Findings
Prompt-only defenses trade off attack suppression and hazard response.
Injection variety is more effective than repetition for attacks.
Defenses shift disruption forms from hard stops to false alerts.
Abstract
Safety-oriented instruction-following is supposed to keep LLM-controlled robots safe. We show it also creates an availability attack surface. By injecting short safety-plausible phrases (1-5 tokens) into a robots audio channel, an adversary can trigger the models safety reasoning to halt or disrupt execution without jailbreaking the model or overriding its policy. In the embodied setting, this is a semantic denial-of-service attack: the agent stops because the injected signal looks like a legitimate alert. Across four vision-language models, seven prompt-level defenses, three deployment modes, and single- and multi-injection settings, we find that prompt-only defenses trade off attack suppression against genuine hazard response. The strongest defenses reduce hard-stop attack success on some models, but defenses change the form of disruption, not its fact: suppressed hard stops re-emerge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
