Entropy-Guided Attention for Private LLMs
Nandan Kumar Jha, Brandon Reagen

TL;DR
This paper introduces an information-theoretic framework using entropy to optimize transformer architectures for private inference in language models, addressing communication and latency challenges.
Contribution
It proposes an entropy-guided attention mechanism and regularization techniques to improve privacy-preserving language models by controlling entropy dynamics.
Findings
Removing nonlinearities causes entropy collapse and entropic overload.
Entropy-guided attention improves model stability and efficiency.
Proposed methods enable more practical private inference with LLMs.
Abstract
The pervasiveness of proprietary language models has raised critical privacy concerns, necessitating advancements in private inference (PI), where computations are performed directly on encrypted data without revealing users' sensitive information. While PI offers a promising solution, its practical deployment is hindered by substantial communication and latency overheads, primarily stemming from nonlinear operations. To address this, we introduce an information-theoretic framework to characterize the role of nonlinearities in decoder-only language models, laying a principled foundation for optimizing transformer-architectures tailored to the demands of PI. By leveraging Shannon's entropy as a quantitative measure, we uncover the previously unexplored dual significance of nonlinearities: beyond ensuring training stability, they are crucial for maintaining attention head diversity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
MethodsSoftmax · Attention Is All You Need · Layer Normalization · Entropy Regularization
