Triangle Counting Accelerations: From Algorithm to In-Memory Computing Architecture
Xueyan Wang, Jianlei Yang, Yinglin Zhao, Xiaotao Jia, Rong Yin, Xuhang, Chen, Gang Qu, and Weisheng Zhao

TL;DR
This paper presents a novel in-memory computing architecture for triangle counting in graphs, reformulating the problem with bitwise operations and leveraging STT-MRAM PIM technology to significantly outperform traditional GPU and FPGA accelerators in speed and energy efficiency.
Contribution
It introduces a co-optimized algorithm-architecture approach for triangle counting using processing-in-memory with STT-MRAM, achieving substantial performance and energy efficiency improvements.
Findings
Outperforms GPU by 12.2x in speed
Outperforms FPGA by 31.8x in speed
Achieves 34x energy efficiency over FPGA
Abstract
Triangles are the basic substructure of networks and triangle counting (TC) has been a fundamental graph computing problem in numerous fields such as social network analysis. Nevertheless, like other graph computing problems, due to the high memory-computation ratio and random memory access pattern, TC involves a large amount of data transfers thus suffers from the bandwidth bottleneck in the traditional Von-Neumann architecture. To overcome this challenge, in this paper, we propose to accelerate TC with the emerging processing-in-memory (PIM) architecture through an algorithm-architecture co-optimization manner. To enable the efficient in-memory implementations, we come up to reformulate TC with bitwise logic operations (such as AND), and develop customized graph compression and mapping techniques for efficient data flow management. With the emerging computational Spin-Transfer Torque…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
