# Impact of Traditional Sparse Optimizations on a Migratory Thread   Architecture

**Authors:** Thomas B. Rolinger, Christopher D. Krieger

arXiv: 1812.05955 · 2018-12-17

## TL;DR

This paper evaluates how traditional sparse matrix optimization techniques perform on the Emu architecture, revealing that reordering improves load balancing and performance significantly compared to cache-based systems.

## Contribution

It provides the first detailed analysis of sparse optimization effects on the Emu migratory thread architecture, highlighting the importance of reordering for performance gains.

## Key findings

- Reordering improves Emu SpMV performance by up to 70%.
- Load balancing issues arise due to migratory thread behavior.
- Performance gains on Emu surpass those on cache-based systems.

## Abstract

Achieving high performance for sparse applications is challenging due to irregular access patterns and weak locality. These properties preclude many static optimizations and degrade cache performance on traditional systems. To address these challenges, novel systems such as the Emu architecture have been proposed. The Emu design uses light-weight migratory threads, narrow memory, and near-memory processing capabilities to address weak locality and reduce the total load on the memory system. Because the Emu architecture is fundamentally different than cache based hierarchical memory systems, it is crucial to understand the cost-benefit tradeoffs of standard sparse algorithm optimizations on Emu hardware. In this work, we explore sparse matrix-vector multiplication (SpMV) on the Emu architecture. We investigate the effects of different sparse optimizations such as dense vector data layouts, work distributions, and matrix reorderings. Our study finds that initially distributing work evenly across the system is inadequate to maintain load balancing over time due to the migratory nature of Emu threads. In severe cases, matrix sparsity patterns produce hot-spots as many migratory threads converge on a single resource. We demonstrate that known matrix reordering techniques can improve SpMV performance on the Emu architecture by as much as 70% by encouraging more consistent load balancing. This can be compared with a performance gain of no more than 16% on a cache-memory based system.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.05955/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1812.05955/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/1812.05955/full.md

---
Source: https://tomesphere.com/paper/1812.05955