A Non-linear GPU Thread Map for Triangular Domains
Crist\'obal A. Navarro, Benjam\'in Bustos, Nancy Hitschfeld

TL;DR
This paper introduces a new GPU thread mapping method for triangular domains that reduces unnecessary threads and improves performance, especially in memory access and shared memory scenarios, with potential extensions to tetrahedral domains.
Contribution
A novel linear thread map () based on lower triangular matrices for efficient GPU computation on triangular domains, reducing thread overhead and enhancing performance.
Findings
Up to 18% improvement in global memory access performance.
() outperforms bounding-box and recursive approaches.
Achieves 7% performance gain in shared memory scenarios.
Abstract
There is a stage in the GPU computing pipeline where a grid of thread-blocks, in \textit{parallel space}, is mapped onto the problem domain, in \textit{data space}. Since the parallel space is restricted to a box type geometry, the mapping approach is typically a -dimensional bounding box (BB) that covers a -dimensional data space. Threads that fall inside the domain perform computations while threads that fall outside are discarded at runtime. In this work we study the case of mapping threads efficiently onto triangular domain problems and propose a block-space linear map , based on the properties of the lower triangular matrix, that reduces the number of unnnecessary threads from to . Performance results for global memory accesses show an improvement of up to with respect to the \textit{bounding-box} approach, placing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Graph Theory and Algorithms
