DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

Christina Heinze; Brian McWilliams; Nicolai Meinshausen

arXiv:1506.02554·stat.ML·August 4, 2016·5 cites

DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

Christina Heinze, Brian McWilliams, Nicolai Meinshausen

PDF

Open Access

TL;DR

DUAL-LOCO is a communication-efficient distributed estimation algorithm that uses random projections to approximate feature dependencies, achieving better speed and accuracy with minimal communication.

Contribution

It introduces a novel distributed estimation method that leverages random projections for feature-based data distribution, reducing communication overhead.

Findings

01

Bounded approximation error weakly dependent on number of workers

02

Outperforms state-of-the-art methods in speed while maintaining accuracy

03

Effective on various real-world datasets

Abstract

We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependences between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques