TL;DR
This paper identifies and analyzes persistent prompt injection vulnerabilities in always-on autonomous AI agents, proposing tiered defenses and a static audit tool, with a prototype implementation available on GitHub.
Contribution
It introduces the concept of sleeper channels, provides a formal defense framework with a soundness theorem, and delivers a static audit tool with runtime mediation hooks.
Findings
End-to-end cron attack demonstrated on OpenClaw
Tiered defense D2 offers soundness guarantees against certain attacks
A static audit tool and runtime adapter prototype are provided
Abstract
Always-on AI agents (OpenClaw, Hermes Agent) run as a single persistent process under the owner's identity, folding messaging, memory, self-authored skills, scheduling, and shell into one authority boundary. This configuration opens what we call \emph{sleeper channels}: an untrusted input to one surface persists as a memory, skill, scheduled job, or filesystem patch, then fires later through a different surface with no attacker present. Two independent axes define the class: persistence substrate and firing-separation. We walk a confused-deputy cron attack end-to-end through OpenClaw at a pinned commit. The defense is tiered (D1, D2, D3), and D2 carries a soundness theorem against seven named deployment invariants. D2 keys on a canonical action-instance digest with one-shot owner attestations, defeating paraphrase laundering, multi-input grant reuse, and replay. A companion artifact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
