Automatic generation of CUDA code performing tensor manipulations using   C++ expression templates

Adam G.M. Lewis; Harald P. Pfeiffer

arXiv:1804.10120·cs.MS·April 27, 2018

Automatic generation of CUDA code performing tensor manipulations using C++ expression templates

Adam G.M. Lewis, Harald P. Pfeiffer

PDF

Open Access

TL;DR

This paper introduces TLoops, a C++ library that uses expression templates to represent tensor operations, enabling automatic generation of optimized CUDA code for GPU acceleration.

Contribution

The paper presents a novel C++ library that automatically generates CUDA code from high-level tensor expressions using expression templates.

Findings

01

TLoops efficiently generates CUDA code for tensor operations.

02

Benchmark results show significant speedups on NVIDIA GPUs.

03

The approach simplifies GPU programming for tensor computations.

Abstract

We present a C++ library, TLoops, which uses a hierarchy of expression templates to represent operations upon tensorial quantities in single lines of C++ code that resemble analytic equations. These expressions may be run as-is, but may also be used to emit equivalent low-level C or CUDA code, which either performs the operations more quickly on the CPU, or allows them to be rapidly ported to run on NVIDIA GPUs. We detail the expression template and C++-class hierarchy that represents the expressions and which makes automatic code-generation possible. We then present benchmarks of the expression-template code, the automatically generated C code, and the automatically generated CUDA code running on several generations of NVIDIA GPU.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Computational Physics and Python Applications