__host__ __device__ -- Generic programming in Cuda
Thomas Mejstrik

TL;DR
This paper introduces programming patterns for Cuda/C++ that enable writing safe, generic code compatible with both host and device, addressing common compilation issues caused by dual instantiation of __host__ and __device__ functions.
Contribution
It presents novel patterns for writing templated Cuda/C++ functions that work seamlessly on CPU and GPU without compiler errors.
Findings
Patterns reduce compiler warnings/errors for dual __host__ and __device__ functions.
Enables safer, more portable generic programming in Cuda/C++.
Improves developer productivity and code maintainability.
Abstract
We present patterns for Cuda/C++ to write save generic code which works both on the host and device side. Writing templated functions in Cuda/C++ both for the CPU and the GPU bears the problem that in general both __host__ and __device__ functions are instantiated, which leads to lots of compiler warnings or errors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management
