Fastrack: Fast IO for Secure ML using GPU TEEs
Yongqin Wang, Rachit Rajat, Jonghyun Lee, Tingting Tang, Murali, Annavaram

TL;DR
Fastrack introduces optimizations for GPU TEE communication in secure ML, significantly reducing overheads and runtime, thus enabling faster secure training and inference on cloud platforms.
Contribution
The paper presents Fastrack, a novel system that reduces communication overheads in GPU TEE secure ML by optimizing data transfer and authentication processes.
Findings
Inference speed improved by up to 84.6%
Training speed increased significantly, up to 455% faster
Communication costs reduced substantially
Abstract
As cloud-based ML expands, ensuring data security during training and inference is critical. GPU-based Trusted Execution Environments (TEEs) offer secure, high-performance solutions, with CPU TEEs managing data movement and GPU TEEs handling authentication and computation. However, CPU-to-GPU communication overheads significantly hinder performance, as data must be encrypted, authenticated, decrypted, and verified, increasing costs by 12.69 to 33.53 times. This results in GPU TEE inference becoming 54.12% to 903.9% slower and training 10% to 455% slower than non-TEE systems, undermining GPU TEE advantages in latency-sensitive applications. This paper analyzes Nvidia H100 TEE protocols and identifies three key overheads: 1) redundant CPU re-encryption, 2) limited authentication parallelism, and 3) unnecessary operation serialization. We propose Fastrack, optimizing with 1) direct GPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Security and Verification in Computing
