DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs

Ziyu Hu; Zhiqing Zhong; Weijian Zheng; Zhijing Ye; Xuwei Tan; Xueru Zhang; Zheng Xie; Rajkumar Kettimuthu; Xiaodong Yu

arXiv:2601.19904·cs.AR·January 29, 2026

DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs

Ziyu Hu, Zhiqing Zhong, Weijian Zheng, Zhijing Ye, Xuwei Tan, Xueru Zhang, Zheng Xie, Rajkumar Kettimuthu, Xiaodong Yu

PDF

Open Access

TL;DR

DABench-LLM is a comprehensive benchmarking framework for evaluating the performance of dataflow AI accelerators on large language model workloads, addressing a gap in standardized assessment methods.

Contribution

It introduces the first standardized benchmarking framework for LLM training on dataflow accelerators, combining performance profiling and scalability analysis.

Findings

01

Identifies performance bottlenecks on three hardware platforms.

02

Provides optimization strategies for resource utilization.

03

Demonstrates framework's effectiveness across diverse accelerators.

Abstract

The exponential growth of large language models has outpaced the capabilities of traditional CPU and GPU architectures due to the slowdown of Moore's Law. Dataflow AI accelerators present a promising alternative; however, there remains a lack of in-depth performance analysis and standardized benchmarking methodologies for LLM training. We introduce DABench-LLM, the first benchmarking framework designed for evaluating LLM workloads on dataflow-based accelerators. By combining intra-chip performance profiling and inter-chip scalability analysis, DABench-LLM enables comprehensive evaluation across key metrics such as resource allocation, load balance, and resource efficiency. The framework helps researchers rapidly gain insights into underlying hardware and system behaviors, and provides guidance for performance optimizations. We validate DABench-LLM on three commodity dataflow…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Advanced Neural Network Applications