Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors
Sandra Catal\'an, Francisco D. Igual, Rafael Mayo, Rafael, Rodr\'iguez-S\'anchez, Enrique S. Quintana-Ort\'i

TL;DR
This paper presents architecture-aware optimizations for matrix multiplication on ARM big.LITTLE asymmetric multicore processors, improving performance and energy efficiency through cache-aware configuration and asymmetric scheduling strategies.
Contribution
It introduces a high-performance, energy-efficient GEMM implementation tailored for ARM big.LITTLE architectures using cache-aware and asymmetric scheduling techniques.
Findings
Significant performance improvements over architecture-oblivious implementations.
Enhanced energy efficiency on ARM big.LITTLE processors.
Effective utilization of heterogeneous cores for scientific computing.
Abstract
Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Advanced Data Storage Technologies
