HE-LRM: Efficient Private Embedding Lookups for Neural Inference Using Fully Homomorphic Encryption

Karthik Garimella; Austin Ebel; Gabrielle De Micheli; Brandon Reagen

arXiv:2506.18150·cs.CR·February 23, 2026

HE-LRM: Efficient Private Embedding Lookups for Neural Inference Using Fully Homomorphic Encryption

Karthik Garimella, Austin Ebel, Gabrielle De Micheli, Brandon Reagen

PDF

TL;DR

HE-LRM introduces optimized techniques for privacy-preserving neural inference with encrypted embedding lookups, significantly improving efficiency and enabling practical deployment of encrypted DLRMs and other models.

Contribution

This work presents novel embedding compression and packing strategies that enable efficient FHE-based inference for models with large sparse embeddings, a previously underexplored area.

Findings

01

Achieved 56× speedup over state-of-the-art embedding compression.

02

Demonstrated end-to-end encrypted DLRM inference with practical latency.

03

Applied embedding-lookup primitives to large language models.

Abstract

Fully Homomorphic Encryption (FHE) allows for computation directly on encrypted data and enables privacy-preserving neural inference in the cloud. Prior work has focused on models with dense inputs (e.g., CNNs), with less attention given to those with sparse inputs such as Deep Learning Recommendation Models (DLRMs). These models require encrypted lookup into large embedding tables that are challenging to implement using FHE's restrictive operators and introduce significant overhead. In this paper, we develop performance optimizations to efficiently support embedding lookups in FHE-based inference pipelines. First, we present an embedding compression technique using client-side digit decomposition that achieves a 56 $\times$ speedup over state-of-the-art. Next, we propose a multi-embedding packing strategy that enables ciphertext SIMD-parallel lookups across multiple tables. Crucially,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.