Improving the GPU space of computation under triangular domain problems

Cristobal A. Navarro; Nancy Hitschfeld

arXiv:1308.1419·cs.DC·August 27, 2015·2 cites

Improving the GPU space of computation under triangular domain problems

Cristobal A. Navarro, Nancy Hitschfeld

PDF

Open Access

TL;DR

This paper introduces a new GPU thread-block mapping function optimized for triangular domain problems, significantly reducing wasted computation and improving performance over existing methods on NVIDIA GPUs.

Contribution

The paper proposes a novel mapping function g(lambda) for GPU computation in triangular domains, enhancing efficiency and outperforming existing strategies in speed and resource utilization.

Findings

01

g(lambda) reduces wasted blocks from O(n^2) to O(n)

02

Experimental results show 12-15% speedup over bounding box strategy

03

Outperforms UTM and matches RB in speed, maintaining thread organization

Abstract

There is a stage in the GPU computing pipeline where a grid of thread-blocks is mapped to the problem domain. Normally, this grid is a k-dimensional bounding box that covers a k-dimensional problem no matter its shape. Threads that fall inside the problem domain perform computations, otherwise they are discarded at runtime. For problems with non-square geometry, this is not always the best idea because part of the space of computation is executed without any practical use. Two- dimensional triangular domain problems, alias td-problems, are a particular case of interest. Problems such as the Euclidean distance map, LU decomposition, collision detection and simula- tions over triangular tiled domains are all td-problems and they appear frequently in many areas of science. In this work, we propose an improved GPU mapping function g(lambda), that maps any lambda block to a unique location…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Parallel Computing and Optimization Techniques · Computational Geometry and Mesh Generation