Data Driven Resource Allocation for Distributed Learning

Travis Dick; Mu Li; Venkata Krishna Pillutla; Colin White; Maria; Florina Balcan; Alex Smola

arXiv:1512.04848·cs.LG·December 16, 2016·1 cites

Data Driven Resource Allocation for Distributed Learning

Travis Dick, Mu Li, Venkata Krishna Pillutla, Colin White, Maria, Florina Balcan, Alex Smola

PDF

Open Access

TL;DR

This paper introduces a data-dependent dispatching method for distributed machine learning that leverages data structure to improve accuracy, scalability, and fault tolerance, outperforming traditional partitioning techniques.

Contribution

It proposes new algorithms with provable guarantees, extends dispatching rules from small samples to full distributions, and demonstrates superior empirical performance.

Findings

01

Significantly higher accuracy than baseline methods on various datasets.

02

Strong scalability with increased computing resources.

03

Effective handling of fault tolerance and balancedness in distributed settings.

Abstract

In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik & Bottou 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Data Stream Mining Techniques