Automatic generation of CUDA code performing tensor manipulations using C++ expression templates
Adam G.M. Lewis, Harald P. Pfeiffer

TL;DR
This paper introduces TLoops, a C++ library that uses expression templates to represent tensor operations, enabling automatic generation of optimized CUDA code for GPU acceleration.
Contribution
The paper presents a novel C++ library that automatically generates CUDA code from high-level tensor expressions using expression templates.
Findings
TLoops efficiently generates CUDA code for tensor operations.
Benchmark results show significant speedups on NVIDIA GPUs.
The approach simplifies GPU programming for tensor computations.
Abstract
We present a C++ library, TLoops, which uses a hierarchy of expression templates to represent operations upon tensorial quantities in single lines of C++ code that resemble analytic equations. These expressions may be run as-is, but may also be used to emit equivalent low-level C or CUDA code, which either performs the operations more quickly on the CPU, or allows them to be rapidly ported to run on NVIDIA GPUs. We detail the expression template and C++-class hierarchy that represents the expressions and which makes automatic code-generation possible. We then present benchmarks of the expression-template code, the automatically generated C code, and the automatically generated CUDA code running on several generations of NVIDIA GPU.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Computational Physics and Python Applications
