Leveraging GPU Tensor Cores for Double Precision Euclidean Distance Calculations
Benoit Gallet, Michael Gowanlock

TL;DR
This paper introduces the first FP64 Euclidean distance algorithm leveraging GPU Tensor Cores, demonstrating notable speedups in high-dimensional data similarity tasks and opening new avenues for general-purpose computations on TCs.
Contribution
It presents the first double precision Euclidean distance algorithm optimized for Tensor Cores, expanding their application beyond machine learning to general-purpose high-precision computations.
Findings
Average speedup of 1.28x over CUDA cores
Up to 2.23x speedup in low-dimensional data
Performance depends strongly on data dimensionality
Abstract
Tensor cores (TCs) are a type of Application-Specific Integrated Circuit (ASIC) and are a recent addition to Graphics Processing Unit (GPU) architectures. As such, TCs are purposefully designed to greatly improve the performance of Matrix Multiply-Accumulate (MMA) operations. While TCs are heavily studied for machine learning and closely related fields, where their high efficiency is undeniable, MMA operations are not unique to these fields. More generally, any computation that can be expressed as MMA operations can leverage TCs, and potentially benefit from their higher computational throughput compared to other general-purpose cores, such as CUDA cores on Nvidia GPUs. In this paper, we propose the first double precision (FP64) Euclidean distance calculation algorithm, which is expressed as MMA operations to leverage TCs on Nvidia GPUs, rather than the more commonly used CUDA cores. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques · Algorithms and Data Compression
