Memory-Efficient Deep Learning Inference in Trusted Execution   Environments

Jean-Baptiste Truong; William Gallagher; Tian Guo; Robert J. Walls

arXiv:2104.15109·cs.CR·October 1, 2021

Memory-Efficient Deep Learning Inference in Trusted Execution Environments

Jean-Baptiste Truong, William Gallagher, Tian Guo, Robert J. Walls

PDF

TL;DR

This paper presents techniques to improve deep neural network inference in trusted execution environments by reducing memory bottlenecks and latency through novel partitioning and compression methods.

Contribution

It introduces y-plane partitioning for consistent execution and memory reduction, along with quantization and compression for large weight matrices, enhancing TEE performance.

Findings

01

Latency overheads increased by 1.09X to 2X with optimizations

02

Unmodified implementation can incur up to 26X latency

03

Significant reduction in memory footprint and latency

Abstract

This study identifies and proposes techniques to alleviate two key bottlenecks to executing deep neural networks in trusted execution environments (TEEs): page thrashing during the execution of convolutional layers and the decryption of large weight matrices in fully-connected layers. For the former, we propose a novel partitioning scheme, y-plane partitioning, designed to (i) provide consistent execution time when the layer output is large compared to the TEE secure memory; and (ii) significantly reduce the memory footprint of convolutional layers. For the latter, we leverage quantization and compression. In our evaluation, the proposed optimizations incurred latency overheads ranging from 1.09X to 2X baseline for a wide range of TEE sizes; in contrast, an unmodified implementation incurred latencies of up to 26X when running inside of the TEE.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.