Quantamination: Dynamic Quantization Leaks Your Data Across the Batch
Hanna Foerster, Ilia Shumailov, Cheng Zhang, Yiren Zhao, Jamie Hayes, Robert Mullins

TL;DR
This paper uncovers a privacy vulnerability called Quantamination, where dynamic quantization in machine learning frameworks can leak sensitive data across batch boundaries, risking user privacy during model serving.
Contribution
The paper identifies and analyzes a critical side-channel vulnerability in dynamic quantization, demonstrating its potential to leak user data across batch boundaries in popular ML frameworks.
Findings
At least 4 popular ML frameworks leak data via Quantamination.
Dynamic quantization can be exploited to partially or fully recover other users' batched data.
The vulnerability arises from improper implementation or configuration of dynamic quantization.
Abstract
Dynamic quantization emerged as a practical approach to increase the utilization and efficiency of the machine learning serving flow. Unlike static quantization, which applies quantization offline, dynamic quantization operates on tensors at run-time, adapting its parameters to the actual input data. Today's mainstream machine learning frameworks, including ML compilers and inference engines, frequently recommend dynamic quantization as an initial step for optimizing model serving. This is because dynamic quantization can significantly reduce memory usage and computational load, leading to faster token generation and improved model serving efficiency without substantial loss in model accuracy. In this paper, we reveal a critical vulnerability in dynamic quantization: an adversary can exploit such quantization strategy to steal sensitive user data placed in the same batch as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
