Comparative study of space filling curves for cache oblivious TU   Decomposition

Fatima K. Abu Salem; Mira Al Arab

arXiv:1612.06069·cs.SC·December 20, 2016

Comparative study of space filling curves for cache oblivious TU Decomposition

Fatima K. Abu Salem, Mira Al Arab

PDF

Open Access

TL;DR

This paper compares different space-filling curve-based matrix layouts for cache-oblivious parallel TU decomposition, finding Morton-hybrid order to be most efficient in index conversion and overall performance, especially for large matrices.

Contribution

It provides a detailed analysis of index conversion costs for various space-filling curves and demonstrates the superior performance of Morton-hybrid layout in cache-oblivious matrix decomposition.

Findings

01

Morton-hybrid order has the lowest index conversion cost.

02

Morton-hybrid layout achieves significant performance improvements for large matrices.

03

Preliminary experiments show orders of magnitude faster computation with Morton-hybrid layout.

Abstract

We examine several matrix layouts based on space-filling curves that allow for a cache-oblivious adaptation of parallel TU decomposition for rectangular matrices over finite fields. The TU algorithm of \cite{Dumas} requires index conversion routines for which the cost to encode and decode the chosen curve is significant. Using a detailed analysis of the number of bit operations required for the encoding and decoding procedures, and filtering the cost of lookup tables that represent the recursive decomposition of the Hilbert curve, we show that the Morton-hybrid order incurs the least cost for index conversion routines that are required throughout the matrix decomposition as compared to the Hilbert, Peano, or Morton orders. The motivation lies in that cache efficient parallel adaptations for which the natural sequential evaluation order demonstrates lower cache miss rate result in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCoding theory and cryptography · Parallel Computing and Optimization Techniques · Low-power high-performance VLSI design