GPU-based Private Information Retrieval for On-Device Machine Learning Inference
Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta,, Yang Li, Liangzhen Lai, Ilias Leontiadis, Minsoo Rhu, Hsien-Hsin S. Lee,, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks, G. Edward Suh

TL;DR
This paper introduces a GPU-accelerated private information retrieval system tailored for on-device machine learning inference, significantly enhancing throughput for privacy-preserving embedding retrieval in applications like recommendations.
Contribution
The paper presents novel GPU-based acceleration techniques for PIR and a co-design approach with ML applications, enabling practical, high-throughput private inference on user devices.
Findings
GPU acceleration improves throughput by over 20x compared to CPU implementations.
PIR-ML co-design yields an additional 5x throughput increase.
System can serve up to 100,000 queries per second on a single GPU.
Abstract
On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques
