BigDataBench: A Scalable and Unified Big Data and AI Benchmark Suite
Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Xu, Wen, Rui Ren, Chen Zheng, Xiwen He, Hainan Ye, Haoning Tang, Zheng Cao,, Shujie Zhang, Jiahui Dai

TL;DR
BigDataBench 4.0 introduces a scalable, unified benchmarking suite for big data and AI workloads based on data motifs, facilitating domain-specific hardware/software co-design and comprehensive CPU pipeline analysis.
Contribution
It proposes a novel scalable benchmarking methodology using data motifs and presents a unified benchmark suite for big data and AI workloads.
Findings
Identification of eight key data motifs for workload representation
Unified benchmark suite enables co-design of hardware and software
Comprehensive CPU pipeline efficiency characterization
Abstract
Several fundamental changes in technology indicate domain-specific hardware and software co-design is the only path left. In this context, architecture, system, data management, and machine learning communities pay greater attention to innovative big data and AI algorithms, architecture, and systems. Unfortunately, complexity, diversity, frequently-changed workloads, and rapid evolution of big data and AI systems raise great challenges. First, the traditional benchmarking methodology that creates a new benchmark or proxy for every possible workload is not scalable, or even impossible for Big Data and AI benchmarking. Second, it is prohibitively expensive to tailor the architecture to characteristics of one or more application or even a domain of applications. We consider each big data and AI workload as a pipeline of one or more classes of units of computation performed on different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Data Storage Technologies
