A Linear Combination-based Method to Construct Proxy Benchmarks for Big Data Workloads
Yikang Yang, Lei Wang, and Jianfeng Zhan

TL;DR
This paper introduces a linear combination-based method to generate proxy benchmarks that accurately mimic real big data workloads' micro-architectural metrics, enabling faster CPU performance evaluation during early design stages.
Contribution
It proposes a novel linear equation system approach and algorithms to create proxy benchmarks with high accuracy, reducing runtime from hours to seconds.
Findings
Proxy benchmarks achieve over 92% accuracy in micro-architectural metrics.
Average runtime of proxy benchmarks is 1.62 seconds compared to nearly 4 hours for real benchmarks.
Method successfully maintains consistency under different system configurations.
Abstract
During early stages of CPU design, benchmarks can only run on simulators to evaluate CPU performance. However, most big data benchmarks are too huge at code size scale, which causes them to be unable to finish running on simulators at an acceptable time cost. Moreover, big data benchmarks usually need complex software stacks to support their running, which is hard to be ported on simulators. Proxy benchmarks, without long running times and complex software stacks, have the same micro-architectural metrics as real benchmarks, which means they can represent real benchmarks' micro-architectural characteristics. Therefore, proxy benchmarks can replace real benchmarks to run on simulators to evaluate the CPU performance. The biggest challenge is how to guarantee that the proxy benchmarks have exactly the same micro-architectural metrics as real benchmarks when the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
