__host__ __device__ -- Generic programming in Cuda

Thomas Mejstrik

arXiv:2309.03912·cs.PL·September 11, 2023

host device -- Generic programming in Cuda

Thomas Mejstrik

PDF

Open Access

TL;DR

This paper introduces programming patterns for Cuda/C++ that enable writing safe, generic code compatible with both host and device, addressing common compilation issues caused by dual instantiation of __host__ and __device__ functions.

Contribution

It presents novel patterns for writing templated Cuda/C++ functions that work seamlessly on CPU and GPU without compiler errors.

Findings

01

Patterns reduce compiler warnings/errors for dual __host__ and __device__ functions.

02

Enables safer, more portable generic programming in Cuda/C++.

03

Improves developer productivity and code maintainability.

Abstract

We present patterns for Cuda/C++ to write save generic code which works both on the host and device side. Writing templated functions in Cuda/C++ both for the CPU and the GPU bears the problem that in general both __host__ and __device__ functions are instantiated, which leads to lots of compiler warnings or errors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management