C-for-Metal: High Performance SIMD Programming on Intel GPUs
Guei-Yuan Lueh, Kaiyu Chen, Gang Chen, Joel Fuentes, Wei-Yu Chen,, Fangwen Fu, Hong Jiang, Hongzheng Li, Daniel Rhee

TL;DR
This paper introduces C-For-Metal, a SIMD programming framework for Intel GPUs that enables developers to achieve near-hardware performance, outperforming traditional SIMT-based approaches by up to 2.7x.
Contribution
The paper presents C-For-Metal, a new explicit SIMD programming language and framework tailored for Intel GPUs, addressing limitations of SIMT models and improving performance.
Findings
CM applications outperform OpenCL by up to 2.7x
Enables fine-grained register management and SIMD control
Achieves close-to-the-metal performance on Intel GPUs
Abstract
The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap we introduce C-For-Metal (CM), an explicit SIMD programming framework designed to deliver close-to-the-metal performance on Intel GPUs. The CM programming language and its vector/matrix types provide an intuitive interface to exploit the underlying hardware features, allowing fine-grained register management, SIMD size control and cross-lane data sharing. Experimental results show that CM applications from different domains outperform the best-known SIMT-based OpenCL implementations, achieving up to 2.7x speedup on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
