DX100: A Programmable Data Access Accelerator for Indirection

Alireza Khadem; Kamalavasan Kamalakkannan; Zhenyan Zhu; Akash Poptani; Yufeng Gu; Jered Benjamin Dominguez-Trujillo; Nishil Talati; Daichi Fujiki; Scott Mahlke; Galen Shipman; Reetuparna Das

arXiv:2505.23073·cs.AR·June 3, 2025

DX100: A Programmable Data Access Accelerator for Indirection

Alireza Khadem, Kamalavasan Kamalakkannan, Zhenyan Zhu, Akash Poptani, Yufeng Gu, Jered Benjamin Dominguez-Trujillo, Nishil Talati, Daichi Fujiki, Scott Mahlke, Galen Shipman, Reetuparna Das

PDF

1 Repo

TL;DR

DX100 is a programmable accelerator that optimizes indirect memory accesses by reordering and coalescing requests, significantly improving memory bandwidth utilization and application performance across diverse workloads.

Contribution

Introduces DX100, a programmable data access accelerator with a general-purpose ISA and compiler support, to enhance indirect memory access efficiency beyond prior approaches.

Findings

01

Achieves 2.6x performance improvement over multicore baseline.

02

Attains 2.0x performance gain over state-of-the-art indirect prefetchers.

03

Improves DRAM row-buffer hit rate and memory bandwidth utilization.

Abstract

Indirect memory accesses frequently appear in applications where memory bandwidth is a critical bottleneck. Prior indirect memory access proposals, such as indirect prefetchers, runahead execution, fetchers, and decoupled access/execute architectures, primarily focus on improving memory access latency by loading data ahead of computation but still rely on the DRAM controllers to reorder memory requests and enhance memory bandwidth utilization. DRAM controllers have limited visibility to future memory accesses due to the small capacity of request buffers and the restricted memory-level parallelism of conventional core and memory systems. We introduce DX100, a programmable data access accelerator for indirect memory accesses. DX100 is shared across cores to offload bulk indirect memory accesses and associated address calculation operations. DX100 reorders, interleaves, and coalesces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arkhadem/dx100
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.