MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention
Wenxuan Zeng, Meng Li, Wenjie Xiong, Tong Tong, Wen-jie Lu, Jin Tan,, Runsheng Wang, Ru Huang

TL;DR
This paper introduces MPCViT, an MPC-friendly Vision Transformer optimized for secure inference, reducing latency significantly while maintaining high accuracy through heterogeneous attention and neural architecture search.
Contribution
The paper proposes MPCViT, a novel MPC-optimized Vision Transformer with heterogeneous attention and a neural architecture search method for efficient secure inference.
Findings
MPCViT achieves up to 6.2x latency reduction with higher accuracy.
MPCViT+ further improves the Pareto front of accuracy and efficiency.
Selective Softmax linearization reduces latency without accuracy loss.
Abstract
Secure multi-party computation (MPC) enables computation directly on encrypted data and protects both data and model privacy in deep learning inference. However, existing neural network architectures, including Vision Transformers (ViTs), are not designed or optimized for MPC and incur significant latency overhead. We observe Softmax accounts for the major latency bottleneck due to a high communication complexity, but can be selectively replaced or linearized without compromising the model accuracy. Hence, in this paper, we propose an MPC-friendly ViT, dubbed MPCViT, to enable accurate yet efficient ViT inference in MPC. Based on a systematic latency and accuracy evaluation of the Softmax attention and other attention variants, we propose a heterogeneous attention optimization space. We also develop a simple yet effective MPC-aware neural architecture search algorithm for fast Pareto…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Glioma Diagnosis and Treatment · Brain Tumor Detection and Classification
MethodsAttention Is All You Need · Multi-Head Linear Attention · Dense Connections · Residual Connection · Layer Normalization · Knowledge Distillation · Linformer · Linear Layer · Softmax
