Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications
Marcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret, Martonosi

TL;DR
Dalorex is a hardware-software co-design that enables high parallelism and energy efficiency for memory-bound graph and linear algebra workloads, scaling beyond 16,000 cores with local memory and traffic-aware scheduling.
Contribution
Dalorex introduces a novel distributed-memory architecture, task-based programming model, and traffic-optimized network to significantly improve scalability and efficiency for irregular memory access applications.
Findings
Achieves strong scaling with over 16,000 cores.
Improves performance and energy efficiency by two orders of magnitude over prior PIM work.
Supports scalable graph and sparse linear algebra workloads.
Abstract
Applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization. While prior work with prefetching, decoupling, or pipelining can mitigate memory latency and improve core utilization, memory bottlenecks persist due to limited off-chip bandwidth. Approaches doing processing in-memory (PIM) with Hybrid Memory Cube (HMC) overcome bandwidth limitations but fail to achieve high core utilization due to poor task scheduling and synchronization overheads. Moreover, the high memory-per-core ratio available with HMC limits strong scaling. We introduce Dalorex, a hardware-software co-design that achieves high parallelism and energy efficiency, demonstrating strong scaling with >16,000 cores when processing graph and sparse linear algebra workloads. Over the prior work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems
