Towards Reliable (and Efficient) Job Executions in a Practical Geo-distributed Data Analytics System
Xiaoda Zhang, Zhuzhong Qian, Sheng Zhang, Yize Li, Xiangbo Li,, Xiaoliang Wang, Sanglu Lu

TL;DR
This paper introduces HOUTU, a geo-distributed data analytics system that ensures reliable and efficient job execution across multiple data centers by employing autonomous managers and cooperative resource management.
Contribution
The paper presents HOUTU, a novel system enabling reliable, efficient, and flexible geo-distributed data analytics without requiring job modifications.
Findings
HOUTU achieves near-centralized performance in experiments.
HOUTU guarantees reliable job execution despite failures.
The system effectively manages resources across multiple regions.
Abstract
Geo-distributed data analytics are increasingly common to derive useful information in large organisations. Naive extension of existing cluster-scale data analytics systems to the scale of geo-distributed data centers faces unique challenges including WAN bandwidth limits, regulatory constraints, changeable/unreliable runtime environment, and monetary costs. Our goal in this work is to develop a practical geo-distribued data analytics system that (1) employs an intelligent mechanism for jobs to efficiently utilize (adjust to) the resources (changeable environment) across data centers; (2) guarantees the reliability of jobs due to the possible failures; and (3) is generic and flexible enough to run a wide range of data analytics jobs without requiring any changes. To this end, we present a new, general geo-distributed data analytics system, HOUTU, that is composed of multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Software System Performance and Reliability
