A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators
Cong Li, Chenhao Xue, Yi Ren, Xiping Dong, Yu Cheng, Yinbo Hu, Fujun Bai, Yixin Guo, Xiping Jiang, Qiang Wu, Zhi Yang, Zhe Cheng, Yuan Xie, Guangyu Sun

TL;DR
This paper introduces ATLAS, a comprehensive, validated simulation framework for 3D-DRAM-based LLM accelerators, enabling detailed performance analysis and design exploration across diverse use cases.
Contribution
ATLAS provides the first open-source, full-stack simulation tool for 3D-DRAM LLM accelerators, bridging the gap between silicon validation and flexible architecture modeling.
Findings
ATLAS achieves ≤8.57% simulation error compared to silicon.
It demonstrates 97.26-99.96% correlation with measured performance.
Enables effective design space exploration for 3D-DRAM LLM accelerators.
Abstract
Large language models (LLMs) exhibit memory-intensive behavior during decoding, making it a key bottleneck in LLM inference. To accelerate decoding execution, hybrid-bonding-based 3D-DRAM has been adopted in LLM accelerators. While this emerging technology provides strong performance gains over existing hardware, current 3D-DRAM accelerators (3D-Accelerators) rely on closed-source evaluation tools, limiting access to publicly available performance analysis methods. Moreover, existing designs are highly customized for specific scenarios, lacking a general and reusable full-stack modeling for 3D-Accelerators across diverse usecases. To bridge this fundamental gap, we present ATLAS, the first silicon-proven Architectural Three-dimesional-DRAM-based LLM Accelerator Simulation framework. Built on commercially deployed multi-layer 3D-DRAM technology, ATLAS introduces unified abstractions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
