Data Motif-based Proxy Benchmarks for Big Data and AI Workloads
Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Zhen Jia, Daoyi, Zheng, Chen Zheng, Xiwen He, Hainan Ye, Haibin Wang, and Rui Ren

TL;DR
This paper introduces a machine learning-based methodology to create data motif-driven proxy benchmarks that accurately mimic big data and AI workloads, significantly reducing simulation time while maintaining high fidelity.
Contribution
It presents a novel approach to construct practical proxy benchmarks from data motifs using machine learning, enabling efficient and accurate simulation of complex workloads.
Findings
Proxy benchmarks reduce execution time by hundreds of times.
Maintain over 90% accuracy in performance data.
Reflect consistent performance trends across architectures.
Abstract
For the architecture community, reasonable simulation time is a strong requirement in addition to performance data accuracy. However, emerging big data and AI workloads are too huge at binary size level and prohibitively expensive to run on cycle-accurate simulators. The concept of data motif, which is identified as a class of units of computation performed on initial or intermediate data, is the first step towards building proxy benchmark to mimic the real-world big data and AI workloads. However, there is no practical way to construct a proxy benchmark based on the data motifs to help simulation-based research. In this paper, we embark on a study to bridge the gap between data motif and a practical proxy benchmark. We propose a data motif-based proxy benchmark generating methodology by means of machine learning method, which combine data motifs with different weights to mimic the big…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies
