GEMMFIP: Unifying GEMM in BLIS

RuQing G. Xu; Field G. Van Zee; Robert A. van de Geijn

arXiv:2302.08417·cs.MS·February 20, 2023

GEMMFIP: Unifying GEMM in BLIS

RuQing G. Xu, Field G. Van Zee, Robert A. van de Geijn

PDF

Open Access 1 Repo

TL;DR

This paper introduces GEMMFIP, a unified approach for high-performance matrix multiplication across small and large matrices by fusing packing with computation, simplifying tuning and improving efficiency.

Contribution

It proposes a novel technique that unifies optimization strategies for small and large matrices in GEMM, implemented within the BLIS framework.

Findings

01

Achieves high performance for both small and large matrices

02

Simplifies tuning of general-purpose matrix libraries

03

Demonstrates effectiveness across multiple architectures

Abstract

Matrix libraries often focus on achieving high performance for problems considered to be either "small" or "large", as these two scenarios tend to respond best to different optimization strategies. We propose a unified technique for implementing matrix operations like general matrix multiplication (GEMM) that can achieve high performance for both small and large problem sizes. The key is to fuse packing -- an operation that copies data to a contiguous layout in memory and which is critical for large matrix performance -- with the first computational "pass" over that data. This boosts performance across the problem size spectrum. As a result, tuning general-purpose libraries becomes simpler since it obviates the need to carefully express and parameterize logic that chooses between a "small matrix" strategy and a "large matrix" strategy. A prototype implementation of the technique built…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xrq-phys/blis
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Interconnection Networks and Systems