AERO: Entropy-Guided Framework for Private LLM Inference
Nandan Kumar Jha, Brandon Reagen

TL;DR
AERO is a framework that reduces nonlinear operations in transformer-based language models to improve privacy-preserving inference efficiency, balancing entropy to maintain model stability and diversity.
Contribution
It introduces an entropy-guided approach with adaptive regularization to strategically eliminate nonlinearities without performance loss.
Findings
Achieves 3.4× reduction in communication overhead
Reduces latency by 1.4×
Maintains model performance during nonlinear elimination
Abstract
Privacy-preserving computation enables language model inference directly on encrypted data yet suffers from prohibitive latency and communication overheads, primarily due to nonlinear functions. Removing nonlinearities, however, can trigger one of two failure modes restricting the potential for nonlinearity removal: entropy collapse in deeper layers, which destabilizes training, and entropic overload in early layers, causing under-utilization of attention heads. To address these challenges, we introduce AERO, an entropy-guided framework to strategically eliminates costly nonlinear operations from transformer architectures, which employs an adaptive recalibration through a head-wise entropy regularizer with learnable per-head strengths, enabling each head to adjust its entropy level while penalizing extreme entropies and fostering functional diversity through a tolerance margin.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Advanced Data Storage Technologies · Security and Verification in Computing
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Entropy Regularization
