PriViT: Vision Transformers for Fast Private Inference

Naren Dhyani; Jianqiao Mo; Minsu Cho; Ameya Joshi; Siddharth Garg,; Brandon Reagen; Chinmay Hegde

arXiv:2310.04604·cs.CR·October 10, 2023

PriViT: Vision Transformers for Fast Private Inference

Naren Dhyani, Jianqiao Mo, Minsu Cho, Ameya Joshi, Siddharth Garg,, Brandon Reagen, Chinmay Hegde

PDF

Open Access 1 Repo 3 Reviews

TL;DR

PriViT introduces a gradient-based method to modify Vision Transformers, making them more suitable for private inference with secure multi-party computation, while preserving accuracy and improving latency-accuracy trade-offs.

Contribution

The paper presents PriViT, a novel algorithm that selectively Taylorizes nonlinearities in ViTs to enhance MPC efficiency without sacrificing prediction accuracy.

Findings

01

Achieves better latency-accuracy trade-offs compared to existing methods.

02

Demonstrates effectiveness on standard image classification benchmarks.

03

Provides publicly available implementation code.

Abstract

The Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications. However, ViTs are ill-suited for private inference using secure multi-party computation (MPC) protocols, due to the large number of non-polynomial operations (self-attention, feed-forward rectifiers, layer normalization). We propose PriViT, a gradient based algorithm to selectively "Taylorize" nonlinearities in ViTs while maintaining their prediction accuracy. Our algorithm is conceptually simple, easy to implement, and achieves improved performance over existing approaches for designing MPC-friendly transformer architectures in terms of achieving the Pareto frontier in latency-accuracy. We confirm these improvements via experiments on several standard image classification tasks. Public code is available at…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

By adjusting GELU and softmax through training using a switched method, they found a different optimal method for each layer.

Weaknesses

Compared to the prior technology, MPCViT, it shows better results in the TinyImagenet but worse latency in the CIFAR-100. In terms of accuracy, it has superior performance in any case. The paper said that DELPHI is focused as the subject of comparison. "In this paper, our focus is exclusively on the DELPHI protocol (Mishra et al., 2020a) for private inference. We choose DELPHI as a matter of convenience;" However, the actual results do not show any performance comparison with DELPHI.

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 2

Strengths

1. The method analysis is clear and latency breakdown is helpful. 2. The experiments on serval image classification benchmarks are solid and comprehensive. 3. The proposed method is speed up than previous SOTA model and achieve competitive performance.

Weaknesses

1. Need more detailed about the knowledge distillation part. 2. More discussion about non-linearity distribution.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

* The paper is well-motivated as the deployment of ViTs in private scenarios is becoming increasingly important and current approaches are not tailored for Transformer architecture. * The proposed method is quite simple yet effective comparing with SOTA approaches.

Weaknesses

[Major] 1. **Experiments:** The authors have conducted sufficient comparative experiments conducted on 3 datasets (CIFAR-10/100 and TinyImageNet). However, the image resolutions are no more than $64\times 64$, which is rather small compared with commonly-used datasets like ImageNet-1k and Caltech-101/256. It will be interesting to see ImageNet results and compare with SENet if possible. 2. **Experiments:** The authors' exclusive use of ViT-Tiny for comparison is insufficient to establish the me

Code & Models

Repositories

nyu-dice-lab/privit
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices

MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization