GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference
Guoci Chen, Xiurui Pan, Qiao Li, Bo Mao, Congming Gao, Chengying Huan, Mingzhe Zhang, Jie Zhang

TL;DR
TIGER is a GPU-accelerated framework that enables high-precision nonlinear layer evaluation for encrypted LLM inference using TFHE, significantly improving efficiency and accuracy over CPU methods.
Contribution
It introduces the first GPU-accelerated high-precision TFHE framework for nonlinear LLM layers, combining novel algorithms and batch processing for enhanced encrypted inference.
Findings
Achieves over 7x to 17x speedups on nonlinear layers compared to CPU baseline.
Provides high-precision implementations of GELU, Softmax, and LayerNorm in encrypted form.
Surpasses native lookup-table precision limits with GPU-optimized algorithms.
Abstract
Deploying large language models (LLMs) as cloud services raises privacy concerns as inference may leak sensitive data. Fully Homomorphic Encryption (FHE) allows computation on encrypted data, but current FHE methods struggle with efficient and precise nonlinear function evaluation. Specifically, CKKS-based approaches require high-degree polynomial approximations, which are costly when target precision increases. Alternatively, TFHE's Programmable Bootstrapping (PBS) outperforms CKKS by offering exact lookup-table evaluation. But it lacks high-precision implementations of LLM nonlinear layers and underutilizes GPU resources. We propose \emph{TIGER}, the first GPU-accelerated framework for high-precision TFHE-based nonlinear LLM layer evaluation. TIGER offers: (1) GPU-optimized WoP-PBS method combined with numerical algorithms to surpass native lookup-table precision limits on nonlinear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
