TL;DR
LogJack demonstrates that large language model debugging agents are vulnerable to indirect prompt injections via cloud logs, with significant risks of command execution and remote code execution, despite guardrails.
Contribution
This paper introduces LogJack, a benchmark with payloads and evaluations revealing vulnerabilities in LLM debugging agents against log-based prompt injections.
Findings
Verbatim command execution ranges from 0% to 86.2% across models.
Passive instructions reduce execution rates to 0%, but some models still execute commands.
Guardrails largely fail to detect log-embedded injections, with limited detection capabilities.
Abstract
LLM debugging agents that consume cloud logs and execute remediation commands are vulnerable to indirect prompt injection through log content. We present LogJack, a benchmark of 42 payloads across 5 cloud log categories, and evaluate 8 foundation models under 3 prompt conditions with 5 independent trials each (n = 160 per model per condition on 32 attack payloads). Under the active condition, verbatim command execution rates range from 0% (Claude Sonnet 4.6) to 86.2% (Llama 3.3 70B). Passive instructions ("do not execute fixes") reduce most models to 0% but Llama still executes at 30.0%. Remote code execution via curl | bash succeeds on 6 of 8 models. Guardrails from AWS, GCP, and Azure largely fail to detect log-embedded injections-Azure Prompt Shield detected only the most obvious payload (1/32), while GCP Model Armor detected none-though they detect identical payloads in isolation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
