Leveraging GPU Tensor Cores for Double Precision Euclidean Distance   Calculations

Benoit Gallet; Michael Gowanlock

arXiv:2209.11287·cs.DC·November 21, 2022

Leveraging GPU Tensor Cores for Double Precision Euclidean Distance Calculations

Benoit Gallet, Michael Gowanlock

PDF

Open Access 1 Repo

TL;DR

This paper introduces the first FP64 Euclidean distance algorithm leveraging GPU Tensor Cores, demonstrating notable speedups in high-dimensional data similarity tasks and opening new avenues for general-purpose computations on TCs.

Contribution

It presents the first double precision Euclidean distance algorithm optimized for Tensor Cores, expanding their application beyond machine learning to general-purpose high-precision computations.

Findings

01

Average speedup of 1.28x over CUDA cores

02

Up to 2.23x speedup in low-dimensional data

03

Performance depends strongly on data dimensionality

Abstract

Tensor cores (TCs) are a type of Application-Specific Integrated Circuit (ASIC) and are a recent addition to Graphics Processing Unit (GPU) architectures. As such, TCs are purposefully designed to greatly improve the performance of Matrix Multiply-Accumulate (MMA) operations. While TCs are heavily studied for machine learning and closely related fields, where their high efficiency is undeniable, MMA operations are not unique to these fields. More generally, any computation that can be expressed as MMA operations can leverage TCs, and potentially benefit from their higher computational throughput compared to other general-purpose cores, such as CUDA cores on Nvidia GPUs. In this paper, we propose the first double precision (FP64) Euclidean distance calculation algorithm, which is expressed as MMA operations to leverage TCs on Nvidia GPUs, rather than the more commonly used CUDA cores. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

benoitgallet/ted-join-hipc22
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques · Algorithms and Data Compression