BigRoots: An Effective Approach for Root-cause Analysis of Stragglers in Big Data System
Honggang Zhou, Yunchun Li, Hailong Yang, Jie Jia, Wei Li

TL;DR
BigRoots is a comprehensive root-cause analysis method for stragglers in big data systems, integrating framework and system features to accurately identify internal and external causes, aiding performance optimization.
Contribution
It introduces BigRoots, a novel approach combining framework and system metrics for detailed root-cause analysis of stragglers in big data environments.
Findings
Effectively identifies root causes of stragglers
Accurately detects internal and external causes
Provides useful guidance for optimization
Abstract
Stragglers are commonly believed to have a great impact on the performance of big data system. However, the reason to cause straggler is complicated. Previous works mostly focus on straggler detection, schedule level optimization and coarse-grained cause analysis. These methods cannot provide valuable insights to help users optimize their programs. In this paper, we propose BigRoots, a general method incorporating both framework and system features for root-cause analysis of stragglers in big data system. BigRoots considers features from big data framework such as shuffle read/write bytes and JVM garbage collection time, as well as system resource utilization such as CPU, I/O and network, which is able to detect both internal and external root causes of stragglers. We verify BigRoots by injecting high resource utilization across different system components and perform case studies to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques
