Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries
Md Raz, Venkata Sai Charan Putrevu, Meet Udeshi, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri

TL;DR
This paper introduces a steganographic canary framework to detect and prevent misuse of large language models by embedding cryptographic identifiers in documents, enabling passive detection before processing.
Contribution
It presents a novel layered steganographic approach combining symbolic and linguistic methods to identify sensitive documents before LLM processing, addressing a critical security gap.
Findings
100% identifier recovery in benign workflows
97% detection rate under targeted adversarial transforms
Effective detection and blocking in a ransomware case study
Abstract
AI-powered malware increasingly exploits cloud-hosted generative-AI services and large language models (LLMs) as analysis engines for reconnaissance and code generation. Simultaneously, enterprise uploads expose sensitive documents to third-party AI vendors. Both threats converge at the AI service ingestion boundary, yet existing defenses focus on endpoints and network perimeters, leaving organizations with limited visibility once plaintext reaches an LLM service. To address this, we present a framework based on steganographic canary files: realistic documents carrying cryptographically derived identifiers embedded via complementary encoding channels. A pre-ingestion filter extracts and verifies these identifiers before LLM processing, enabling passive, format-agnostic detection without semantic classification. We support two modes of operation where Mode A marks existing sensitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
