ATiM: Autotuning Tensor Programs for Processing-in-DRAM

Yongwon Shin; Dookyung Kang; Hyojin Sung

arXiv:2412.19630·cs.AR·June 3, 2025

ATiM: Autotuning Tensor Programs for Processing-in-DRAM

Yongwon Shin, Dookyung Kang, Hyojin Sung

PDF

Open Access

TL;DR

ATiM is an automated tensor compiler for Processing-in-DRAM that significantly improves performance and programmability of memory-intensive applications like LLMs through systematic autotuning and optimization.

Contribution

ATiM introduces the first fully automated, autotuning-enabled tensor compiler specifically designed for DRAM-PIM systems, enhancing programmability and performance.

Findings

01

Achieves up to 6.18× performance improvement on UPMEM benchmarks.

02

Attains 8.21× speedup on GPT-J layers.

03

Demonstrates effective optimization and boundary handling for DRAM-PIM workloads.

Abstract

Processing-in-DRAM (DRAM-PIM) has emerged as a promising technology for accelerating memory-intensive operations in modern applications, such as Large Language Models (LLMs). Despite its potential, current software stacks for DRAM-PIM face significant challenges, including reliance on hand-tuned libraries that hinder programmability, limited support for high-level abstractions, and the lack of systematic optimization frameworks. To address these limitations, we present ATiM, a search-based optimizing tensor compiler for UPMEM. Key features of ATiM include: (1) automated searches of the joint search space for host and kernel tensor programs, (2) PIM-aware optimizations for efficiently handling boundary conditions, and (3) improved search algorithms for the expanded search space of UPMEM systems. Our experimental results on UPMEM hardware demonstrate performance gains of up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Computational Physics and Python Applications · Distributed and Parallel Computing Systems