Aggressive Compression Enables LLM Weight Theft
Davis Brown, Juan-Pablo Rivera, Dan Hendrycks, Mantas Mazeika

TL;DR
This paper shows that aggressive compression of large language model weights can significantly increase the risk of theft by enabling faster exfiltration, and explores defenses including making models harder to compress and using forensic watermarks.
Contribution
It introduces the concept of compression-based exfiltration attacks on LLMs and evaluates effective defenses like forensic watermarks.
Findings
Attackers can achieve 16x to 100x compression of model weights.
Compression reduces exfiltration time from months to days.
Forensic watermarks are effective and inexpensive defenses.
Abstract
As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration attacks. In this work, we consider exfiltration attacks where an adversary attempts to sneak model weights out of a datacenter over a network. While exfiltration attacks are multi-step cyber attacks, we demonstrate that a single factor, the compressibility of model weights, significantly heightens exfiltration risk for large language models (LLMs). We tailor compression specifically for exfiltration by relaxing decompression constraints and demonstrate that attackers could achieve 16x to 100x compression with minimal trade-offs, reducing the time it would take for an attacker to illicitly transmit model weights from the defender's server from months to days. Finally, we study defenses designed to reduce exfiltration risk in three distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Digital and Cyber Forensics
