AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems

Zhaoting Gong; Ran Ran; Fan Yao; and Wujie Wen

arXiv:2604.03425·cs.CR·April 7, 2026

AEGIS: Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems

Zhaoting Gong, Ran Ran, Fan Yao, and Wujie Wen

PDF

TL;DR

AEGIS introduces a novel system for scalable long-sequence encrypted Transformer inference on multi-GPU systems, significantly reducing communication and memory usage while maintaining high efficiency.

Contribution

It proposes a device placement strategy based on ciphertext dependencies to optimize multi-GPU homomorphic Transformer inference, reducing communication and improving scalability.

Findings

01

Reduces inter-GPU communication by up to 81.3%

02

Achieves 96.62% scaling efficiency on four GPUs

03

Attains 3.86x end-to-end speedup and 69.1% per-device memory reduction

Abstract

Fully Homomorphic Encryption (FHE) enables privacy-preserving Transformer inference, but long-sequence encrypted Transformers quickly exceed single-GPU memory capacity because encoded weights are already large and encrypted activations grow rapidly with sequence length. Multi-GPU execution therefore becomes unavoidable, yet scaling remains challenging because communication is jointly induced by application-level aggregation and encryption-level RNS coupling. Existing approaches either synchronize between devices frequently or replicate encrypted tensors across devices, leading to excessive communication and latency. We present AEGIS, an Application-Encryption Guided Inference System for scalable long-sequence encrypted Transformer inference on multi-GPU platforms. AEGIS derives device placement from ciphertext dependencies jointly induced by Transformer dataflow and CKKS polynomial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.