Comparing CPU and GPU compute of PERMANOVA on MI300A
Igor Sfiligoi

TL;DR
This paper compares CPU and GPU performance of PERMANOVA on AMD MI300A, revealing GPU's superior speed with benefits from SMT, highlighting the advantages of integrated CPU-GPU architecture for memory-bound algorithms.
Contribution
It provides a detailed performance analysis of PERMANOVA on a novel AMD MI300A platform, demonstrating GPU dominance and the impact of SMT in memory-bound computations.
Findings
GPU outperforms CPU in PERMANOVA on MI300A
Brute force approach favors GPU cores
SMT significantly improves GPU performance
Abstract
Comparing the tradeoffs of CPU and GPU compute for memory-heavy algorithms is often challenging, due to the drastically different memory subsystems on host CPUs and discrete GPUs. The AMD MI300A is an exception, since it sports both CPU and GPU cores in a single package, all backed by the same type of HBM memory. In this paper we analyze the performance of Permutational Multivariate Analysis of Variance (PERMANOVA), a non-parametric method that tests whether two or more groups of objects are significantly different based on a categorical factor. This method is memory-bound and has been recently optimized for CPU cache locality. Our tests show that GPU cores on the MI300A prefer the brute force approach instead, significantly outperforming the CPU-based implementation. The significant benefit of Simultaneous Multithreading (SMT) was also a pleasant surprise.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
