10 Observations on Google Cluster Trace + 2 Measures for Cluster   Utilization Enhancement

Yuqing Zhu; Yilei Wang; Fan Wang

arXiv:1508.02111·cs.DC·August 12, 2015

10 Observations on Google Cluster Trace + 2 Measures for Cluster Utilization Enhancement

Yuqing Zhu, Yilei Wang, Fan Wang

PDF

Open Access

TL;DR

This paper analyzes Google's cluster trace to derive 10 new insights and proposes two measures for improving cluster utilization, building on Borg's design and performance.

Contribution

It offers new observations from Google cluster trace analysis and suggests two novel measures to enhance cluster utilization beyond Borg's current capabilities.

Findings

01

10 new observations on Google cluster trace

02

Two measures for potential utilization improvement

03

Correlation of trace analysis with Borg design

Abstract

Utilization enhancement is a key concern to cluster owners. Google's cluster manager named Borg manages its clusters at an overall utilization higher than many others' clusters. Recently, Google has disclosed the details of its powerful cluster manager Borg. Quite a few lessons are summarized from the Borg experiences. Nevertheless, we find that more can be learned if the Borg design is correlated with the trace analysis of a Google cluster managed by Borg. There is one such trace released four years ago. In this paper, we analyze the Google cluster trace and make 10 observations not found in previous analyses. We also correlates the results of our analysis and previous analyses to the Borg design, such that we find two measures that can possibly further improve cluster utilization over Borg.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Caching and Content Delivery · Distributed and Parallel Computing Systems