Collaboratively Learning Linear Models with Structured Missing Data

Chen Cheng; Gary Cheng; John Duchi

arXiv:2307.11947·stat.ML·July 25, 2023·1 cites

Collaboratively Learning Linear Models with Structured Missing Data

Chen Cheng, Gary Cheng, John Duchi

PDF

Open Access

TL;DR

This paper introduces a distributed semi-supervised algorithm for collaboratively learning linear models across multiple agents with different feature subsets, achieving near-optimal estimation without sharing labeled data.

Contribution

The paper presents Collab, a novel communication-efficient algorithm for multi-agent linear regression with structured missing data, outperforming imputation methods in theory and practice.

Findings

01

Nearly asymptotically local minimax optimal performance.

02

Effective in real and synthetic data scenarios.

03

Does not require sharing labeled data.

Abstract

We study the problem of collaboratively learning least squares estimates for $m$ agents. Each agent observes a different subset of the features $\unicode x 2013$ e.g., containing data collected from sensors of varying resolution. Our goal is to determine how to coordinate the agents in order to produce the best estimator for each agent. We propose a distributed, semi-supervised algorithm Collab, consisting of three steps: local training, aggregation, and distribution. Our procedure does not require communicating the labeled data, making it communication efficient and useful in settings where the labeled data is inaccessible. Despite this handicap, our procedure is nearly asymptotically local minimax optimal $\unicode x 2013$ even among estimators allowed to communicate the labeled data such as imputation methods. We test our method on real and synthetic data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing · Distributed Sensor Networks and Detection Algorithms