Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

Haochuan Kevin Wang; Zechen Zhang

arXiv:2603.28013·cs.CR·April 13, 2026

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers

Haochuan Kevin Wang, Zechen Zhang

PDF

TL;DR

This paper introduces a stage-level tracking methodology for prompt injection in multi-agent LLM systems, revealing how different models and defenses perform across attack surfaces and informing deployment safety decisions.

Contribution

It presents a novel cryptographic token-based kill-chain canary approach to diagnose prompt injection vulnerabilities across multiple stages and defenses in production-like settings.

Findings

01

Write-node placement is crucial for safety; verified routing prevents propagation.

02

All defenses fail on at least one surface without adversarial adaptation.

03

Invisible payloads can match or surpass visible-text attack success rates.

Abstract

Multi-agent LLM systems are entering production -- processing documents, managing workflows, acting on behalf of users -- yet their resilience to prompt injection is still evaluated with a single binary: did the attack succeed? This leaves architects without the diagnostic information needed to harden real pipelines. We introduce a kill-chain canary methodology that tracks a cryptographic token through four stages (EXPOSED -> PERSISTED -> RELAYED -> EXECUTED) across 950 runs, five frontier LLMs, six attack surfaces, and five defense conditions. The results reframe prompt injection as a pipeline-architecture problem: every model is fully exposed, yet outcomes diverge downstream -- Claude blocks all injections at memory-write (0/164 ASR), GPT-4o-mini propagates at 53%, and DeepSeek exhibits 0%/100% across surfaces from the same model. Three findings matter for deployment: (1) write-node…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.