On Optimizing Locality of Graph Transposition on Modern Architectures
Mohsen Koohi Esfahani, Hans Vandierendonck

TL;DR
This paper introduces PoTra, a new graph transposition algorithm optimized for modern architectures that significantly improves performance by leveraging graph structure and memory locality, achieving up to 8.7x speedup on large datasets.
Contribution
PoTra is a novel GT algorithm that optimizes locality and performance by leveraging graph structure and architecture-specific features, reducing memory overhead and improving speed.
Findings
PoTra achieves up to 8.7x speedup over previous algorithms.
PoTra's performance loss is limited to 15.7% on average.
PoTra performs well across multiple CPU architectures and large datasets.
Abstract
This paper investigates the shared-memory Graph Transposition (GT) problem, a fundamental graph algorithm that is widely used in graph analytics and scientific computing. Previous GT algorithms have significant memory requirements that are proportional to the number of vertices and threads which obstructs their use on large graphs. Moreover, atomic memory operations have become comparably fast on recent CPU architectures, which creates new opportunities for improving the performance of concurrent atomic accesses in GT. We design PoTra, a GT algorithm which leverages graph structure and processor and memory architecture to optimize locality and performance. PoTra limits the size of additional data structures close to CPU cache sizes and utilizes the skewed degree distribution of graph datasets to optimize locality and performance. We present the performance model of PoTra to explain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
