# CLBlast: A Tuned OpenCL BLAS Library

**Authors:** Cedric Nugteren

arXiv: 1705.05249 · 2018-04-30

## TL;DR

CLBlast is an open-source, highly optimized OpenCL BLAS library that accelerates dense linear algebra operations across diverse hardware, with tunable performance features and support for mixed-precision computations, benefiting machine learning and scientific applications.

## Contribution

The paper introduces CLBlast, a versatile and tunable OpenCL BLAS library optimized for various devices, including low-power GPUs, with support for FP16 and batched operations, enhancing performance and portability.

## Key findings

- CLBlast outperforms existing OpenCL BLAS libraries on multiple devices.
- Tuning for specific hardware and problem sizes improves performance.
- Support for FP16 reduces bandwidth and energy consumption.

## Abstract

This work introduces CLBlast, an open-source BLAS library providing optimized OpenCL routines to accelerate dense linear algebra for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the core of many applications (e.g. deep learning, iterative solvers, astrophysics, computational fluid dynamics, quantum chemistry). CLBlast has five main advantages over other OpenCL BLAS libraries: 1) it is optimized for and tested on a large variety of OpenCL devices including less commonly used devices such as embedded and low-power GPUs, 2) it can be explicitly tuned for specific problem-sizes on specific hardware platforms, 3) it can perform operations in half-precision floating-point FP16 saving bandwidth, time and energy, 4) it has an optional CUDA back-end, 5) and it can combine multiple operations in a single batched routine, accelerating smaller problems significantly. This paper describes the library and demonstrates the advantages of CLBlast experimentally for different use-cases on a wide variety of OpenCL hardware.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.05249/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1705.05249/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1705.05249/full.md

---
Source: https://tomesphere.com/paper/1705.05249