Verifying LLM Inference to Detect Model Weight Exfiltration

Roy Rinberg; Adam Karvonen; Alexander Hoover; Daniel Reuter; Keri Warr

arXiv:2511.02620·cs.CR·March 16, 2026

Verifying LLM Inference to Detect Model Weight Exfiltration

Roy Rinberg, Adam Karvonen, Alexander Hoover, Daniel Reuter, Keri Warr

PDF

Open Access

TL;DR

This paper presents a verification framework to detect and mitigate model weight exfiltration via steganography during large language model inference, significantly reducing information leakage with minimal performance impact.

Contribution

It formalizes model exfiltration as a security game, proposes a provably effective verification scheme, and introduces practical estimators for non-determinism in LLM inference.

Findings

01

Detector reduces exfiltratable info to <0.5% on 30B models

02

False-positive rate is below 0.01%

03

Achieves over 200x slowdown for adversaries

Abstract

As large AI models become increasingly valuable assets, the risk of model weight exfiltration from inference servers grows accordingly. An attacker controlling an inference server may exfiltrate model weights by hiding them within ordinary model responses, a strategy known as steganography. This work investigates how to verify LLM model inference to defend against such attacks and, more broadly, to detect anomalous or buggy behavior during inference. We formalize model weight exfiltration as a security game, propose a verification framework that can provably mitigate steganographic exfiltration, and specify the trust assumptions associated with our scheme. To enable verification, we characterize valid sources of non-determinism in large language model inference and introduce two practical estimators for them. We evaluate our detection framework on several open-weight models ranging from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI)