ATiM: Autotuning Tensor Programs for Processing-in-DRAM
Yongwon Shin, Dookyung Kang, Hyojin Sung

TL;DR
ATiM is an automated tensor compiler for Processing-in-DRAM that significantly improves performance and programmability of memory-intensive applications like LLMs through systematic autotuning and optimization.
Contribution
ATiM introduces the first fully automated, autotuning-enabled tensor compiler specifically designed for DRAM-PIM systems, enhancing programmability and performance.
Findings
Achieves up to 6.18× performance improvement on UPMEM benchmarks.
Attains 8.21× speedup on GPT-J layers.
Demonstrates effective optimization and boundary handling for DRAM-PIM workloads.
Abstract
Processing-in-DRAM (DRAM-PIM) has emerged as a promising technology for accelerating memory-intensive operations in modern applications, such as Large Language Models (LLMs). Despite its potential, current software stacks for DRAM-PIM face significant challenges, including reliance on hand-tuned libraries that hinder programmability, limited support for high-level abstractions, and the lack of systematic optimization frameworks. To address these limitations, we present ATiM, a search-based optimizing tensor compiler for UPMEM. Key features of ATiM include: (1) automated searches of the joint search space for host and kernel tensor programs, (2) PIM-aware optimizations for efficiently handling boundary conditions, and (3) improved search algorithms for the expanded search space of UPMEM systems. Our experimental results on UPMEM hardware demonstrate performance gains of up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Computational Physics and Python Applications · Distributed and Parallel Computing Systems
