A New Vectorization Technique for Expression Templates in C++
J. Progsch, Y. Ineichen, A. Adelmann

TL;DR
This paper introduces SALT, a new vectorization technique for C++ expression templates that significantly improves performance, matching the speed of optimized BLAS libraries while maintaining flexibility.
Contribution
The paper presents SALT, a novel method combining expression templates with loop unrolling to enhance vector operation performance in C++.
Findings
SALT achieves performance comparable to top BLAS libraries.
Benchmarks show SALT outperforms existing expression template libraries.
The approach maintains C++ template flexibility.
Abstract
Vector operations play an important role in high performance computing and are typically provided by highly optimized libraries that implement the BLAS (Basic Linear Algebra Subprograms) interface. In C++ templates and operator overloading allow the implementation of these vector operations as expression templates which construct custom loops at compile time and providing a more abstract interface. Unfortunately existing expression template libraries lack the performance of fast BLAS(Basic Linear Algebra Subprograms) implementations. This paper presents a new approach - Statically Accelerated Loop Templates (SALT) - to close this performance gap by combining expression templates with an aggressive loop unrolling technique. Benchmarks were conducted using the Intel C++ compiler and GNU Compiler Collection to assess the performance of our library relative to Intel's Math Kernel Library as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Distributed and Parallel Computing Systems
