Debunking the CUDA Myth Towards GPU-based AI Systems
Yunjae Lee, Juntaek Lim, Jehyeon Bang, Eunyeong Cho, Huijong Jeong,, Taesu Kim, Hyungjun Kim, Joonhyung Lee, Jinseop Im, Ranggi Hwang, Se Jung, Kwon, Dongsoo Lee, Minsoo Rhu

TL;DR
This paper evaluates Intel Gaudi NPUs as a potential alternative to NVIDIA GPUs in AI systems, highlighting competitive performance and energy efficiency, but noting software maturity challenges.
Contribution
It provides a comprehensive comparison of Gaudi NPUs and GPUs, including microbenchmarking and software optimization strategies, revealing their potential and current limitations.
Findings
Gaudi-2 achieves competitive performance with A100 in key AI workloads.
Gaudi NPU's energy efficiency is comparable to A100.
Software maturity remains a challenge for Gaudi NPUs.
Abstract
This paper presents a comprehensive evaluation of Intel Gaudi NPUs as an alternative to NVIDIA GPUs, which is currently the de facto standard in AI system design. First, we create a suite of microbenchmarks to compare Intel Gaudi-2 with NVIDIA A100, showing that Gaudi-2 achieves competitive performance not only in primitive AI compute, memory, and communication operations but also in executing several important AI workloads end-to-end. We then assess Gaudi NPU's programmability by discussing several software-level optimization strategies to employ for implementing critical FBGEMM operators and vLLM, evaluating their efficiency against GPU-optimized counterparts. Results indicate that Gaudi-2 achieves energy efficiency comparable to A100, though there are notable areas for improvement in terms of software maturity. Overall, we conclude that, with effective integration into high-level AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Advanced Neural Network Applications
