TCIM: Triangle Counting Acceleration With Processing-In-MRAM   Architecture

Xueyan Wang; Jianlei Yang; Yinglin Zhao; Yingjie Qi; Meichen Liu,; Xingzhou Cheng; Xiaotao Jia; Xiaoming Chen; Gang Qu; Weisheng Zhao

arXiv:2007.10702·cs.AR·July 22, 2020·1 cites

TCIM: Triangle Counting Acceleration With Processing-In-MRAM Architecture

Xueyan Wang, Jianlei Yang, Yinglin Zhao, Yingjie Qi, Meichen Liu,, Xingzhou Cheng, Xiaotao Jia, Xiaoming Chen, Gang Qu, Weisheng Zhao

PDF

Open Access

TL;DR

This paper introduces a novel in-memory triangle counting accelerator using processing-in-MRAM, significantly reducing data transfer bottlenecks and achieving substantial speedups and energy efficiency improvements over traditional GPU and FPGA solutions.

Contribution

The paper presents a new in-memory TC acceleration method using bitwise logic in STT-MRAM, with optimized data mapping techniques for enhanced performance and energy efficiency.

Findings

01

Achieves 9x speedup over GPU and 23.4x over FPGA

02

Reduces computation by 99.99% and memory writes by 72%

03

Improves energy efficiency by 20.6x over FPGA

Abstract

Triangle counting (TC) is a fundamental problem in graph analysis and has found numerous applications, which motivates many TC acceleration solutions in the traditional computing platforms like GPU and FPGA. However, these approaches suffer from the bandwidth bottleneck because TC calculation involves a large amount of data transfers. In this paper, we propose to overcome this challenge by designing a TC accelerator utilizing the emerging processing-in-MRAM (PIM) architecture. The true innovation behind our approach is a novel method to perform TC with bitwise logic operations (such as \texttt{AND}), instead of the traditional approaches such as matrix computations. This enables the efficient in-memory implementations of TC computation, which we demonstrate in this paper with computational Spin-Transfer Torque Magnetic RAM (STT-MRAM) arrays. Furthermore, we develop customized graph…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing