Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Haoqi Wu; Wenjing Fang; Yancheng Zheng; Junming Ma; Jin Tan; Yinggui; Wang; Lei Wang

arXiv:2405.05525·cs.CR·May 10, 2024

Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Haoqi Wu, Wenjing Fang, Yancheng Zheng, Junming Ma, Jin Tan, Yinggui, Wang, Lei Wang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

Ditto introduces a quantization-aware framework for secure Transformer inference using MPC, significantly reducing computation and communication overhead while maintaining model utility, demonstrated on BERT and GPT-2 models.

Contribution

It integrates quantization-aware techniques into MPC-based secure inference for Transformers, with novel primitives for type conversions and a distillation process to preserve accuracy.

Findings

01

Ditto is 3.14 to 4.40 times faster than MPCFormer.

02

Ditto is 1.44 to 2.35 times faster than PUMA.

03

Achieves negligible utility degradation.

Abstract

Due to the rising privacy concerns on sensitive client data and trained models like Transformers, secure multi-party computation (MPC) techniques are employed to enable secure inference despite attendant overhead. Existing works attempt to reduce the overhead using more MPC-friendly non-linear function approximations. However, the integration of quantization widely used in plaintext inference into the MPC domain remains unclear. To bridge this gap, we propose the framework named Ditto to enable more efficient quantization-aware secure Transformer inference. Concretely, we first incorporate an MPC-friendly quantization into Transformer inference and employ a quantization-aware distillation procedure to maintain the model utility. Then, we propose novel MPC primitives to support the type conversions that are essential in quantization and implement the quantization-aware MPC execution of…

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

+ MPC-friendly Quantization-Aware Distillation. + MPC primitives for scale down and scale up. + Comparison with SOTA.

Weaknesses

- Distillation is widely used in MPC-based secure inference works. - It seems limited contributions of MPC protocols.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. This paper targets an important problem in private inference. 2. The proposed type conversion protocols are creative solutions to a key challenge in quantization-aware secure inference. 3. Extensive evaluations analyzing efficiency, utility, scalability, and communication costs and latency on factors like sequence length and batch size.

Weaknesses

1. Lack of comparison to the latest related work.

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 2

Strengths

* The authors present a solution that addresses multiple bottlenecks in secure multi-party computation (MPC) for Transformer models. For example, challenges like handling non-linear functions and dynamic quantization in an MPC context. They also offer a solution such as modified dyadic quantization and static dyadic quantization for these issues. * The paper highlights and addresses the often-overlooked disconnect between the expertise in machine learning and multi-party computation. For examp

Weaknesses

* The paper acknowledges that both Ditto and MPCFormer exhibit noticeable utility drops in Bert tasks when employing ReLU approximation for Softmax. They offer Quad approximation for GeLU to maintain a balance between utility and efficiency, but this limitation may constrain the applicability of the framework for tasks where such approximations are not tolerable. * The paper in general is hard to read and require additional proof-reading. I would recommend making the paper to be easier to read

Code & Models

Repositories

secretflow/spu
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Data Storage Technologies · Semiconductor materials and devices

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Attention Dropout · Dropout · Label Smoothing · Residual Connection · Softmax · WordPiece · Position-Wise Feed-Forward Layer