Simultaneous semi-parametric estimation of clustering and regression
Matthieu Marbac, Mohammed Sedki, Christophe Biernacki, Vincent, Vandewalle

TL;DR
This paper proposes a simultaneous semi-parametric method for estimating clustering and regression models when group labels are missing, improving over traditional two-step approaches by jointly modeling both aspects.
Contribution
It introduces a novel joint estimation approach for clustering and regression, addressing biases from sequential methods and demonstrating its effectiveness on real health data.
Findings
Joint estimation reduces bias in regression parameters.
Method performs well across various distributions and models.
Application to health data shows practical relevance.
Abstract
We investigate the parameter estimation of regression models with fixed group effects, when the group variable is missing while group related variables are available. This problem involves clustering to infer the missing group variable based on the group related variables, and regression to build a model on the target variable given the group and eventually additional variables. Thus, this problem can be formulated as the joint distribution modeling of the target and of the group related variables. The usual parameter estimation strategy for this joint model is a two-step approach starting by learning the group variable (clustering step) and then plugging in its estimator for fitting the regression model (regression step). However, this approach is suboptimal (providing in particular biased regression estimates) since it does not make use of the target variable for clustering. Thus, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
