Understanding Big Data Analytic Workloads on Modern Processors
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo, Ninghui, Sun

TL;DR
This paper characterizes the micro-architectural behaviors of 11 representative big data analytics workloads on modern processors, revealing their unique characteristics and impacts on system performance, with practical recommendations for system optimization.
Contribution
It provides a detailed analysis of big data analytics workloads' micro-architectural behaviors and their differences from traditional workloads, including the impact of software stacks.
Findings
Big data workloads have distinct micro-architectural characteristics.
Long latency data accesses are the main factor affecting CPI.
Software stacks significantly contribute to front end stalls.
Abstract
Big data analytics applications play a significant role in data centers, and hence it has become increasingly important to understand their behaviors in order to further improve the performance of data center computer systems, in which characterizing representative workloads is a key practical problem. In this paper, after investigating three most impor- tant application domains in terms of page views and daily visitors, we chose 11 repre- sentative data analytics workloads and characterized their micro-architectural behaviors by using hardware performance counters, so as to understand the impacts and implications of data analytics workloads on the systems equipped with modern superscalar out-of-order processors. Our study reveals that big data analytics applications themselves share many inherent characteristics, which place them in a different class from traditional workloads and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Graph Theory and Algorithms · Advanced Data Storage Technologies
