Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

Md Raz; Venkata Sai Charan Putrevu; Meet Udeshi; Prashanth Krishnamurthy; Farshad Khorrami; Ramesh Karri

arXiv:2603.28655·cs.CR·March 31, 2026

Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

Md Raz, Venkata Sai Charan Putrevu, Meet Udeshi, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri

PDF

TL;DR

This paper introduces a steganographic canary framework to detect and prevent misuse of large language models by embedding cryptographic identifiers in documents, enabling passive detection before processing.

Contribution

It presents a novel layered steganographic approach combining symbolic and linguistic methods to identify sensitive documents before LLM processing, addressing a critical security gap.

Findings

01

100% identifier recovery in benign workflows

02

97% detection rate under targeted adversarial transforms

03

Effective detection and blocking in a ransomware case study

Abstract

AI-powered malware increasingly exploits cloud-hosted generative-AI services and large language models (LLMs) as analysis engines for reconnaissance and code generation. Simultaneously, enterprise uploads expose sensitive documents to third-party AI vendors. Both threats converge at the AI service ingestion boundary, yet existing defenses focus on endpoints and network perimeters, leaving organizations with limited visibility once plaintext reaches an LLM service. To address this, we present a framework based on steganographic canary files: realistic documents carrying cryptographically derived identifiers embedded via complementary encoding channels. A pre-ingestion filter extracts and verifies these identifiers before LLM processing, enabling passive, format-agnostic detection without semantic classification. We support two modes of operation where Mode A marks existing sensitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.