# Analyzing GPU Tensor Core Potential for Fast Reductions

**Authors:** Roberto Carrasco, Raimundo Vega, Crist\'obal A. Navarro

arXiv: 1903.03640 · 2019-03-12

## TL;DR

This paper explores using Nvidia GPU tensor cores for fast parallel reduction, proposing a new algorithm that leverages MMA operations to significantly improve reduction speed over traditional methods.

## Contribution

It introduces a novel GPU tensor-core based reduction algorithm and analyzes its potential performance advantages over conventional approaches.

## Key findings

- Reduces reduction steps to $T(n) = 5	ext{log}_{m^2}(n)$
- Achieves a speedup of $S = rac{4}{5}	ext{log}_2(m^2)$
- Demonstrates tensor cores' potential for non-MMA tasks

## Abstract

The Nvidia GPU architecture has introduced new computing elements such as the \textit{tensor cores}, which are special processing units dedicated to perform fast matrix-multiply-accumulate (MMA) operations and accelerate \textit{Deep Learning} applications. In this work we present the idea of using tensor cores for a different purpose such as the parallel arithmetic reduction problem, and propose a new GPU tensor-core based algorithm as well as analyze its potential performance benefits in comparison to a traditional GPU-based one. The proposed method, encodes the reduction of $n$ numbers as a set of $m\times m$ MMA tensor-core operations (for Nvidia's Volta architecture $m=16$) and takes advantage from the fact that each MMA operation takes just one GPU cycle. When analyzing the cost under a simplified GPU computing model, the result is that the new algorithm manages to reduce a problem of $n$ numbers in $T(n) = 5\log_{m^2}(n)$ steps with a speedup of $S = \frac{4}{5}\log_2(m^2)$.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.03640/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1903.03640/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1903.03640/full.md

---
Source: https://tomesphere.com/paper/1903.03640