Improving the GPU space of computation under triangular domain problems
Cristobal A. Navarro, Nancy Hitschfeld

TL;DR
This paper introduces a new GPU thread-block mapping function optimized for triangular domain problems, significantly reducing wasted computation and improving performance over existing methods on NVIDIA GPUs.
Contribution
The paper proposes a novel mapping function g(lambda) for GPU computation in triangular domains, enhancing efficiency and outperforming existing strategies in speed and resource utilization.
Findings
g(lambda) reduces wasted blocks from O(n^2) to O(n)
Experimental results show 12-15% speedup over bounding box strategy
Outperforms UTM and matches RB in speed, maintaining thread organization
Abstract
There is a stage in the GPU computing pipeline where a grid of thread-blocks is mapped to the problem domain. Normally, this grid is a k-dimensional bounding box that covers a k-dimensional problem no matter its shape. Threads that fall inside the problem domain perform computations, otherwise they are discarded at runtime. For problems with non-square geometry, this is not always the best idea because part of the space of computation is executed without any practical use. Two- dimensional triangular domain problems, alias td-problems, are a particular case of interest. Problems such as the Euclidean distance map, LU decomposition, collision detection and simula- tions over triangular tiled domains are all td-problems and they appear frequently in many areas of science. In this work, we propose an improved GPU mapping function g(lambda), that maps any lambda block to a unique location…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Parallel Computing and Optimization Techniques · Computational Geometry and Mesh Generation
