Data Driven Resource Allocation for Distributed Learning
Travis Dick, Mu Li, Venkata Krishna Pillutla, Colin White, Maria, Florina Balcan, Alex Smola

TL;DR
This paper introduces a data-dependent dispatching method for distributed machine learning that leverages data structure to improve accuracy, scalability, and fault tolerance, outperforming traditional partitioning techniques.
Contribution
It proposes new algorithms with provable guarantees, extends dispatching rules from small samples to full distributions, and demonstrates superior empirical performance.
Findings
Significantly higher accuracy than baseline methods on various datasets.
Strong scalability with increased computing resources.
Effective handling of fault tolerance and balancedness in distributed settings.
Abstract
In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik & Bottou 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Data Stream Mining Techniques
