Tascade: Hardware Support for Atomic-free, Asynchronous and Efficient   Reduction Trees

Marcelo Orenes-Vera; Esin Tureci; David Wentzlaff; Margaret Martonosi

arXiv:2311.15810·cs.AR·April 23, 2024·1 cites

Tascade: Hardware Support for Atomic-free, Asynchronous and Efficient Reduction Trees

Marcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret Martonosi

PDF

Open Access 2 Repos

TL;DR

Tascade is a hardware-software co-design that enables scalable, asynchronous reduction trees for large manycore servers, significantly reducing communication overhead and improving performance for irregular graph workloads.

Contribution

It introduces a novel execution model and hardware support for efficient, atomic-free reductions, scaling up to a million PUs in large-scale parallel systems.

Findings

01

Achieves over 7600 GTEPS in BFS on RMAT-26 with a million PUs

02

Reduces communication and power consumption compared to prior approaches

03

Scales efficiently for irregular workloads on large manycore architectures

Abstract

Graph search and sparse data-structure traversal workloads contain challenging irregular memory patterns on global data structures that need to be modified atomically. Distributed processing of these workloads has relied on server threads operating on their own data copies that are merged upon global synchronization. As parallelism increases within each server, the communication challenges that arose in distributed systems a decade ago are now being encountered within large manycore servers. Prior work has achieved scalability for sparse applications up to thousands of PUs on-chip, but does not scale further due to increasing communication distances and load-imbalance across PUs. To address these challenges we propose Tascade, a hardware-software co-design that offers support for storage-efficient data-private reductions as well as asynchronous and opportunistic reduction trees. Tascade…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Memory and Neural Computing