WANify: Gauging and Balancing Runtime WAN Bandwidth for Geo-distributed Data Analytics
Anshuman Das Mohapatra, Kwangsung Oh

TL;DR
WANify is a framework that dynamically predicts WAN bandwidth using machine learning to optimize data transfer and reduce latency and costs in geo-distributed data analytics systems.
Contribution
It introduces a machine learning-based method to accurately gauge runtime WAN bandwidth and optimize connection strategies considering network dynamics and heterogeneity.
Findings
WANify improves WAN throughput by balancing link capacities.
It reduces latency and costs by up to 26% and 16%.
The approach effectively handles network dynamics and heterogeneity.
Abstract
Accurate wide area network (WAN) bandwidth (BW) is essential for geo-distributed data analytics (GDA) systems to make optimal decisions such as data and task placement to improve performance. Existing GDA systems, however, measure WAN BW statically and independently between data centers (DCs), while data transfer occurs dynamically and simultaneously among DCs during workload execution. Also, they use a single connection WAN BW that cannot capture actual WAN capacities between distant DCs. Such inaccurate WAN BWs yield sub-optimal decisions, inflating overall query latency and cost. In this paper, we present WANify, a new framework that precisely and dynamically gauges achievable runtime WAN BW using a machine learning prediction scheme, decision tree-based Random Forest. This helps GDA systems make better decisions yielding reduced latency and costs including WAN BW monitoring costs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Software System Performance and Reliability
