Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark   Study

Jianwei Zhu; Hang Yin; Peng Deng; Aline Almeida; Shunfan Zhou

arXiv:2409.03992·cs.DC·November 6, 2024

Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study

Jianwei Zhu, Hang Yin, Peng Deng, Aline Almeida, Shunfan Zhou

PDF

Open Access

TL;DR

This study benchmarks the performance impact of Trusted Execution Environments on NVIDIA Hopper GPUs during large language model inference, highlighting that data transfer overhead is the main bottleneck with minimal GPU computational overhead.

Contribution

It provides the first detailed performance analysis of TEE mode on NVIDIA Hopper GPUs for LLM inference, emphasizing data transfer as the primary performance factor.

Findings

01

Overhead is below 7% for most LLM queries.

02

Larger models and longer sequences have nearly zero overhead.

03

Data transfer via PCIe is the main bottleneck.

Abstract

This report evaluates the performance impact of enabling Trusted Execution Environments (TEE) on NVIDIA Hopper GPUs for large language model (LLM) inference tasks. We benchmark the overhead introduced by TEE mode across various LLMs and token lengths, with a particular focus on the bottleneck caused by CPU-GPU data transfers via PCIe. Our results indicate that while there is minimal computational overhead within the GPU, the overall performance penalty is primarily attributable to data transfer. For the majority of typical LLM queries, the overhead remains below 7%, with larger models and longer sequences experiencing nearly zero overhead.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChaos-based Image/Signal Encryption

MethodsFocus