HE-LRM: Efficient Private Embedding Lookups for Neural Inference Using Fully Homomorphic Encryption
Karthik Garimella, Austin Ebel, Gabrielle De Micheli, Brandon Reagen

TL;DR
HE-LRM introduces optimized techniques for privacy-preserving neural inference with encrypted embedding lookups, significantly improving efficiency and enabling practical deployment of encrypted DLRMs and other models.
Contribution
This work presents novel embedding compression and packing strategies that enable efficient FHE-based inference for models with large sparse embeddings, a previously underexplored area.
Findings
Achieved 56× speedup over state-of-the-art embedding compression.
Demonstrated end-to-end encrypted DLRM inference with practical latency.
Applied embedding-lookup primitives to large language models.
Abstract
Fully Homomorphic Encryption (FHE) allows for computation directly on encrypted data and enables privacy-preserving neural inference in the cloud. Prior work has focused on models with dense inputs (e.g., CNNs), with less attention given to those with sparse inputs such as Deep Learning Recommendation Models (DLRMs). These models require encrypted lookup into large embedding tables that are challenging to implement using FHE's restrictive operators and introduce significant overhead. In this paper, we develop performance optimizations to efficiently support embedding lookups in FHE-based inference pipelines. First, we present an embedding compression technique using client-side digit decomposition that achieves a 56 speedup over state-of-the-art. Next, we propose a multi-embedding packing strategy that enables ciphertext SIMD-parallel lookups across multiple tables. Crucially,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
