FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads
Jie Zhang, Myoungsoo Jung, Mahmut Taylan Kandemir

TL;DR
FUSE is a novel GPU cache system integrating STT-MRAM to reduce off-chip memory accesses, significantly boosting performance and energy efficiency by predicting memory access patterns and optimizing data placement.
Contribution
This work introduces FUSE, a new GPU cache architecture that combines STT-MRAM with SRAM to minimize off-chip memory traffic and improve GPU performance.
Findings
Reduces outgoing memory references by 32%.
Improves GPU performance by 217%.
Reduces energy cost by 53%.
Abstract
In this work, we propose FUSE, a novel GPU cache system that integrates spin-transfer torque magnetic random-access memory (STT-MRAM) into the on-chip L1D cache. FUSE can minimize the number of outgoing memory accesses over the interconnection network of GPU's multiprocessors, which in turn can considerably improve the level of massive computing parallelism in GPUs. Specifically, FUSE predicts a read-level of GPU memory accesses by extracting GPU runtime information and places write-once-read-multiple (WORM) data blocks into the STT-MRAM, while accommodating write-multiple data blocks over a small portion of SRAM in the L1D cache. To further reduce the off-chip memory accesses, FUSE also allows WORM data blocks to be allocated anywhere in the STT-MRAM by approximating the associativity with the limited number of tag comparators and I/O peripherals. Our evaluation results show that, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Magnetic properties of thin films
