PingAn: An Insurance Scheme for Job Acceleration in Geo-distributed Big Data Analytics System
Tiantian Wang, Zhuzhong Qian, Sanglu Lu

TL;DR
PingAn is an online insurance algorithm designed for geo-distributed big data systems that optimizes job completion times by dynamically insuring tasks across clusters, balancing resource use and reliability.
Contribution
It introduces a novel online insurance algorithm with provable performance guarantees for cross-cluster job execution in geo-distributed systems.
Findings
Reduces average job flowtimes by at least 14% compared to state-of-the-art mechanisms.
Achieves up to 40% reduction in job flowtimes in Spark on Yarn.
Demonstrates practicality and effectiveness through trace-driven simulations and real system implementation.
Abstract
Geo-distributed data analysis in a cloud-edge system is emerging as a daily demand. Out of saving time in wide area data transfer, some tasks are dispersed to the edges. However, due to limited computing, overload interference and cluster-level unreachable troubles, efficient execution in the edges is hard, which obstructs the guarantee on the efficiency and reliability of jobs. Launching copies across clusters can be an insurance on a task's completion. Considering cluster heterogeneity and accompanying remote data fetch, cluster selection of copies affects execution quality, as different insuring plans drive different revenues. For providing On-Line-Real-Time analysis results, a system needs to insure the geo-distributed resource for the arriving jobs. Our challenge is to achieve the optimal revenue by dynamically weighing the gains due to insurance against the loss of occupying extra…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Big Data and Business Intelligence
