AME-PIM: Can Memory be Your Next Tensor Accelerator?

Emanuele Venieri; Simone Manoni; Alberto Florian; Jaehyun Park; Kyomin Sohn; Andrea Bartolini

arXiv:2604.27808·cs.AR·May 1, 2026

AME-PIM: Can Memory be Your Next Tensor Accelerator?

Emanuele Venieri, Simone Manoni, Alberto Florian, Jaehyun Park, Kyomin Sohn, Andrea Bartolini

PDF

TL;DR

This paper explores using high bandwidth memory with processing-in-memory as a backend for ISA-level matrix acceleration, proposing a new execution model and dataflow to improve performance and reduce data movement.

Contribution

It introduces a PEP-based execution model and a reduction-free dataflow for HBM-PIM to support matrix operations with minimized host involvement.

Findings

01

Achieves up to 14.9 GFLOP/s on Samsung Aquabolt-XL.

02

Supports end-to-end execution of element-wise, GEMV, and GEMM operations.

03

Enables in-memory accumulation despite lack of native reduction support.

Abstract

High Bandwidth Memory with Processing-in-Memory (HBM-PIM) offers an opportunity to reduce data movement by executing computation directly inside memory, but current commercial platforms expose limited instruction sets and require specialized software stacks. In this work, we investigate whether HBM-PIM can serve as a backend for ISA-level matrix acceleration, using the RISC-V Attached Matrix Extension (AME) as a semantic reference. We propose a PEP-based execution model that maps AME element-wise and matrix instructions to HBM-PIM micro-kernels and data instructions in memory operations. Differently from SoA HBM-PIM, we introduce a reduction-free outer-product dataflow that enables accumulation entirely within memory despite the lack of native reduction support. Our approach supports end-to-end execution of element-wise operations, GEMV, and GEMM in PIM mode, minimizing host involvement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.