Collaboratively Learning Linear Models with Structured Missing Data
Chen Cheng, Gary Cheng, John Duchi

TL;DR
This paper introduces a distributed semi-supervised algorithm for collaboratively learning linear models across multiple agents with different feature subsets, achieving near-optimal estimation without sharing labeled data.
Contribution
The paper presents Collab, a novel communication-efficient algorithm for multi-agent linear regression with structured missing data, outperforming imputation methods in theory and practice.
Findings
Nearly asymptotically local minimax optimal performance.
Effective in real and synthetic data scenarios.
Does not require sharing labeled data.
Abstract
We study the problem of collaboratively learning least squares estimates for agents. Each agent observes a different subset of the featurese.g., containing data collected from sensors of varying resolution. Our goal is to determine how to coordinate the agents in order to produce the best estimator for each agent. We propose a distributed, semi-supervised algorithm Collab, consisting of three steps: local training, aggregation, and distribution. Our procedure does not require communicating the labeled data, making it communication efficient and useful in settings where the labeled data is inaccessible. Despite this handicap, our procedure is nearly asymptotically local minimax optimaleven among estimators allowed to communicate the labeled data such as imputation methods. We test our method on real and synthetic data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing · Distributed Sensor Networks and Detection Algorithms
