An Application-Layer Multi-Modal Covert-Channel Reference Monitor for LLM Agent Egress
Alfredo Metere

TL;DR
This paper presents a comprehensive application-layer monitor for LLM agent egress that detects and mitigates covert channels across text, image, and audio media by employing capacity-reducing stages, cryptographic attestation, and residual capacity measurement.
Contribution
It introduces a multi-stage, lossless pipeline with media scramblers and cryptographic verification to prevent covert data exfiltration from LLM agents.
Findings
Residual capacity is driven to zero on destroyable channels.
The system enforces lossless stages from day one.
It measures mutual information to verify channel destruction.
Abstract
A large language model (LLM) agent that sends messages can leak data inside them. Destination allowlists and content scanners do not police whether an otherwise-benign payload is itself a covert channel: a compromised agent encodes bits in zero-width characters, homoglyphs, whitespace, base64, JavaScript Object Notation (JSON) key ordering, message timing or size -- and, in binary egress, in least-significant-bit (LSB) pixel planes, per-image mean luminance, inter-image sequence permutation, ultrasonic tones, or audible-band sonified data. Our egress reference monitor has three contributions. (i) A text pipeline of ten capacity-reducing stages, a per-sink leaky-bucket capacity ledger, and a staged posture that enforces lossless stages from day one. (ii) Two media scramblers (a Fourier-domain audio band-limiter and a red-green-blue (RGB) image bit-depth and mean-luminance bucketer) gated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
