Flash: A Hybrid Private Inference Protocol for Deep CNNs with High   Accuracy and Low Latency on CPU

Hyeri Roh; Jinsu Yeo; Yeongil Ko; Gu-Yeon Wei; David Brooks; Woo-Seok; Choi

arXiv:2401.16732·cs.CR·January 20, 2025·1 cites

Flash: A Hybrid Private Inference Protocol for Deep CNNs with High Accuracy and Low Latency on CPU

Hyeri Roh, Jinsu Yeo, Yeongil Ko, Gu-Yeon Wei, David Brooks, Woo-Seok, Choi

PDF

Open Access

TL;DR

Flash is a hybrid private inference protocol for deep CNNs that achieves high accuracy and low latency on CPU by combining optimized algorithms, novel training strategies, and efficient secure computation techniques.

Contribution

The paper introduces a new hybrid private inference protocol that significantly reduces latency and communication costs for deep CNNs on CPU, with novel algorithms and training methods.

Findings

01

End-to-end inference latency less than 1 minute on CPU for deep CNNs.

02

Achieves 4-94x performance gain over state-of-the-art in convolution operations.

03

Reduces communication cost by 84-196x for activation evaluation.

Abstract

This paper presents Flash, an optimized private inference (PI) hybrid protocol utilizing both homomorphic encryption (HE) and secure two-party computation (2PC), which can reduce the end-to-end PI latency for deep CNN models less than 1 minute with CPU. To this end, first, Flash proposes a low-latency convolution algorithm built upon a fast slot rotation operation and a novel data encoding scheme, which results in 4-94x performance gain over the state-of-the-art. Second, to minimize the communication cost introduced by the standard nonlinear activation function ReLU, Flash replaces the entire ReLUs with the polynomial $x^{2} + x$ and trains deep CNN models with the new training strategy. The trained models improve the inference accuracy for CIFAR-10/100 and TinyImageNet by 16% on average (up to 40% for ResNet-32) compared to prior art. Last, Flash proposes an efficient 2PC-based $x^{2} + x$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques