Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch

Aravind Sankaran; Navid Akbari Alashti; Christos Psarras; Paolo; Bientinesi

arXiv:2202.09888·cs.MS·August 9, 2022

Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch

Aravind Sankaran, Navid Akbari Alashti, Christos Psarras, Paolo, Bientinesi

PDF

2 Repos

TL;DR

This paper benchmarks TensorFlow and PyTorch to assess their use of linear algebra knowledge for optimization, revealing missing opportunities for performance improvements in common matrix operations.

Contribution

It develops benchmarks to evaluate the linear algebra optimization capabilities of TensorFlow and PyTorch, highlighting specific missing optimizations and providing guidelines for performance improvements.

Findings

01

Several linear algebra optimizations are missing in TF and PyT.

02

Opportunities exist to reduce scalar operations using algebraic laws.

03

Frameworks could better identify optimal matrix chain parenthesization.

Abstract

Linear algebra operations, which are ubiquitous in machine learning, form major performance bottlenecks. The High-Performance Computing community invests significant effort in the development of architecture-specific optimized kernels, such as those provided by the BLAS and LAPACK libraries, to speed up linear algebra operations. However, end users are progressively less likely to go through the error prone and time-consuming process of directly using said kernels; instead, frameworks such as TensorFlow (TF) and PyTorch (PyT), which facilitate the development of machine learning applications, are becoming more and more popular. Although such frameworks link to BLAS and LAPACK, it is not clear whether or not they make use of linear algebra knowledge to speed up computations. For this reason, in this paper we develop benchmarks to investigate the linear algebra optimization capabilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings