Performance of Confidential Computing GPUs
Antonio Mart\'inez Ibarra, Julian James Stephen, Aurora Gonz\'alez Vidal, K. R. Jayaram, Antonio Fernando Skarmeta G\'omez

TL;DR
This paper evaluates the performance impact of confidentiality features on GPU inference, revealing significant overheads in confidential mode compared to non-confidential settings across latency, throughput, and SLA metrics.
Contribution
It provides a detailed analysis of GPU inference performance under confidential computing constraints, highlighting the performance trade-offs involved.
Findings
Confidential mode increases inference latency by 20-30%.
Throughput is 45-70% higher in non-confidential mode.
SLA attainment is 15-20% better without confidentiality overhead.
Abstract
This work examines latency, throughput, and other metrics when performing inference on confidential GPUs. We explore different traffic patterns and scheduling strategies using a single Virtual Machine with one NVIDIA H100 GPU, to perform relaxed batch inferences on multiple Large Language Models (LLMs), operating under the constraint of swapping models in and out of memory, which necessitates efficient control. The experiments simulate diverse real-world scenarios by varying parameters such as traffic load, traffic distribution patterns, scheduling strategies, and Service Level Agreement (SLA) requirements. The findings provide insights into the differences between confidential and non-confidential settings when performing inference in scenarios requiring active model swapping. Results indicate that in No-CC mode, relaxed batch inference with model swapping latency is 20-30% lower than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Distributed systems and fault tolerance
