# MatRox: Modular approach for improving data locality in Hierarchical   (Mat)rix App(Rox)imation

**Authors:** Bangtian Liu, Kazem Cheshmi, Saeed Soori, Michelle Mills Strout and, Maryam Mehri Dehnavi

arXiv: 1812.07152 · 2019-12-03

## TL;DR

MatRox introduces a modular framework that enhances data locality and load balancing in hierarchical matrix approximations, significantly accelerating matrix operations and enabling efficient reuse of computations across different accuracy levels.

## Contribution

The paper presents a novel modular approach with structure analysis, blocking, and code specialization to improve locality, load balancing, and reusability in hierarchical matrix approximations.

## Key findings

- MatRox's code is up to 6x faster than existing libraries.
- Reusing computations in MatRox reduces re-computation costs by up to 2.64x.
- MatRox improves load balancing and data locality in matrix multiplications.

## Abstract

Hierarchical matrix approximations have gained significant traction in the machine learning and scientific community as they exploit available low-rank structures in kernel methods to compress the kernel matrix. The resulting compressed matrix, HMatrix, is used to reduce the computational complexity of operations such as HMatrix-matrix multiplications with tuneable accuracy in an evaluation phase. Existing implementations of HMatrix evaluations do not preserve locality and often lead to unbalanced parallel execution with high synchronization. Also, current solutions require the compression phase to re-execute if the kernel method or the required accuracy change. In this work, we describe MatRox, a framework that uses novel structure analysis strategies, blocking and coarsen, with code specialization and a storage format to improve locality and create load-balanced parallel tasks for HMatrix-matrix multiplications. Modularization of the matrix compression phase enables the reuse of computations when there are changes to the input accuracy and the kernel function. The MatRox-generated code for matrix-matrix multiplication is 2.98x, 1.60x, and 5.98x faster than library implementations available in GOFMM, SMASH, and STRUMPACK respectively. Additionally, the ability to reuse portions of the compression computation for changes to the accuracy leads to up to 2.64x improvement with MatRox over five changes to accuracy using GOFMM.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.07152/full.md

## Figures

27 figures with captions in the complete paper: https://tomesphere.com/paper/1812.07152/full.md

## References

58 references — full list in the complete paper: https://tomesphere.com/paper/1812.07152/full.md

---
Source: https://tomesphere.com/paper/1812.07152