EM for Mixture of Linear Regression with Clustered Data

Amirhossein Reisizadeh; Khashayar Gatmiry; Asuman Ozdaglar

arXiv:2308.11518·cs.LG·August 23, 2023

EM for Mixture of Linear Regression with Clustered Data

Amirhossein Reisizadeh, Khashayar Gatmiry, Asuman Ozdaglar

PDF

Open Access

TL;DR

This paper demonstrates that exploiting clustered structures in distributed data significantly reduces the number of EM iterations needed for accurate parameter estimation in mixture of linear regressions, especially with proper initialization.

Contribution

It introduces a novel analysis showing EM converges in a constant number of iterations when data has a clustered structure, improving over the standard logarithmic iteration complexity.

Findings

01

EM requires only O(1) iterations with structured data

02

Proper initialization is crucial for fast convergence

03

The analysis combines asymptotic optimization and generalization guarantees

Abstract

Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federated learning where a common latent variable governs the distribution of all the samples generated by a client. It is therefore natural to ask how the underlying clustered structures in distributed data can be exploited to improve learning schemes. In this paper, we tackle this question in the special case of estimating $d$ -dimensional parameters of a two-component mixture of linear regressions problem where each of $m$ nodes generates $n$ samples with a shared latent variable. We employ the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Statistical Methods and Bayesian Inference