PIM or CXL-PIM? Understanding Architectural Trade-offs Through Large-Scale Benchmarking
I-Ting Lee, Bao-Kai Wang, Liang-Chi Chen, Wen Sheng Lim, Da-Wei Chang, Yu-Ming Chang, and Chieng-Chung Ho

TL;DR
This paper compares processing-in-memory (PIM) and CXL-PIM architectures through large-scale benchmarking, revealing workload-dependent tradeoffs and regimes where each approach performs better, guiding future near-memory system design.
Contribution
It provides the first large-scale, real hardware comparison of PIM and CXL-PIM, analyzing performance tradeoffs across different workload regimes.
Findings
Unified address space in CXL-PIM reduces data transfer bottlenecks.
Performance advantages of PIM or CXL-PIM depend on workload phase and dataset size.
Practical guidance for designing near-memory systems based on workload characteristics.
Abstract
Processing-in-memory (PIM) reduces data movement by executing near memory, but our large-scale characterization on real PIM hardware shows that end-to-end performance is often limited by disjoint host and device address spaces that force explicit staging transfers. In contrast, CXL-PIM provides a unified address space and cache-coherent access at the cost of higher access latency. These opposing interface models create workload-dependent tradeoffs that are not captured by small-scale studies. This work presents a side-by-side, large-scale comparison of PIM and CXL-PIM using measurements from real PIM hardware and trace-driven CXL modeling. We identify when unified-address access amortizes link latency enough to overcome transfer bottlenecks, and when tightly coupled PIM remains preferable. Our results reveal phase- and dataset-size regimes in which the relative ranking between the two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Memory and Neural Computing
