Private Transformer Inference in MLaaS: A Survey
Yang Li, Xinyu Zhou, Yitong Wang, Liangxin Qian, Jun Zhao

TL;DR
This survey reviews recent advances in Private Transformer Inference (PTI) techniques that enable privacy-preserving AI model deployment in MLaaS, emphasizing cryptographic methods, challenges, and evaluation frameworks.
Contribution
It introduces a structured taxonomy and evaluation framework for PTI, highlighting recent solutions and addressing the balance between privacy, efficiency, and high-performance inference.
Findings
Overview of cryptographic techniques like secure multi-party computation and homomorphic encryption.
Identification of key challenges in resource efficiency and privacy trade-offs.
Proposed evaluation framework for PTI solutions.
Abstract
Transformer models have revolutionized AI, powering applications like content generation and sentiment analysis. However, their deployment in Machine Learning as a Service (MLaaS) raises significant privacy concerns, primarily due to the centralized processing of sensitive user data. Private Transformer Inference (PTI) offers a solution by utilizing cryptographic techniques such as secure multi-party computation and homomorphic encryption, enabling inference while preserving both user data and model privacy. This paper reviews recent PTI advancements, highlighting state-of-the-art solutions and challenges. We also introduce a structured taxonomy and evaluation framework for PTI, focusing on balancing resource efficiency with privacy and bridging the gap between high-performance inference and data privacy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Library Science and Information Systems
Methodstravel james · Attention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Layer Normalization · Byte Pair Encoding · Softmax · Absolute Position Encodings
