Encryption-Friendly LLM Architecture
Donghwan Rho, Taeseong Kim, Minje Park, Jung Woo Kim, Hyunsik Chae,, Ernest K. Ryu, Jung Hee Cheon

TL;DR
This paper introduces an encryption-friendly transformer architecture for large language models that enables privacy-preserving inference with significant computational speedups, making private LLM services more feasible.
Contribution
The authors propose a modified HE-friendly transformer architecture utilizing LoRA fine-tuning and Gaussian kernels, achieving substantial speedups while maintaining performance.
Findings
6.94x speedup for fine-tuning
2.3x speedup for inference
Maintains performance comparable to plaintext models
Abstract
Large language models (LLMs) offer personalized responses based on user interactions, but this use case raises serious privacy concerns. Homomorphic encryption (HE) is a cryptographic protocol supporting arithmetic computations in encrypted states and provides a potential solution for privacy-preserving machine learning (PPML). However, the computational intensity of transformers poses challenges for applying HE to LLMs. In this work, we propose a modified HE-friendly transformer architecture with an emphasis on inference following personalized (private) fine-tuning. Utilizing LoRA fine-tuning and Gaussian kernels, we achieve significant computational speedups -- 6.94x for fine-tuning and 2.3x for inference -- while maintaining performance comparable to plaintext models. Our findings provide a viable proof of concept for offering privacy-preserving LLM services in areas where data…
Peer Reviews
Decision·ICLR 2025 Poster
The problem statement is well motivated. Several works are currently exploring evaluation of LLMs under FHE, and the potential applications are also quite compelling. This is an extremely difficult task in terms of achieving viable efficiency, and any method that advances the state-of-the-art in this direction is welcome. The results achieved here show that several optimizations in other domains, that is, LoRA and the use of Gaussian Kernels, turn out to be useful for the evaluation of LLMs in F
I am not particularly impressed by the novelty of this paper. It uses existing FHE tools with existing ML optimizations. This may not be a weakness on its own given the positive results of combining these techniques, but I still think the improvement factors may not be big enough for these techniques to become "enablers" of private LLM applications in practice. Put differently, I am not convinced that the gains here are a significant enough to overcome the blockers that prevent LLMs + FHE from b
1. Compared to MPC approaches, n on-interactive property helps HE to be feasible to compute over large-scale LLMs without including considerable communication overhead among computing parties. 2. This work has a great focus on the fine-tuning stage to make LLMs secure for users, which also concentrates on the key components like attention layers in the transformer, and it is also combined with SoTA techniques like LoRA to make the process more efficient. 3. Writing with bottleneck-improvement pa
1. When you mentioned SoTA LLMs, you should notice that decoder-based models have been proved very powerful in generative tasks. After iteration of the recent few years, BERT series is not as useful and prevalent as decoder models. Hence, the significance to protect BERT-based model is less essential in the current LLMs. 2. Although this work introduces how HE and CKKS work in the secure way, this work does not specify adversary model, such ability of adversaries, type of adversaries (e.g., semi
This paper is oriented to the problem of inefficiency of transformer architecture under HE, although the existing research has produced richer results. The main contribution of this paper is to enhance the speed of transformer architecture under HE, and the authors have carried out many experiments to verify the rationality and advantages of the scheme. I think the experiments in this paper are full, and the advantages of this paper are elaborated in terms of speed and model performance, which s
1. Insufficient innovation. First, the topic chosen for this paper is a more widely studied one. Second, the solutions in this paper seem to be a direct combination and application of existing advanced schemes, and it is not intuitively obvious in the paper that the authors have improved on existing methods. 2. The description in 2.1 does not seem to be consistent with Figure 1. Furthermore, why does the statement “LLM weights are protected in the strict cryptographic sense (line 149)” hold? 3.
Code & Models
Videos
Taxonomy
TopicsCryptography and Data Security · Advanced Data Storage Technologies
