Handling Covariate Mismatch in Federated Linear Prediction

Alexis Ayme; R\'emi Khellaf

arXiv:2602.02083·math.ST·February 3, 2026

Handling Covariate Mismatch in Federated Linear Prediction

Alexis Ayme, R\'emi Khellaf

PDF

Open Access

TL;DR

This paper addresses the challenge of covariate mismatch in federated linear prediction, proposing modular methods for low and high-dimensional settings, with theoretical guarantees on learning rates.

Contribution

It introduces novel approaches for federated learning with covariate mismatch, including a plug-in estimator and an impute-then-regress strategy, with comprehensive theoretical analysis.

Findings

01

Effective covariance and cross-moment estimation in low dimensions.

02

Imputation-based linear modeling in high dimensions.

03

Explicit learning rate characterizations under covariate mismatch.

Abstract

Federated learning enables institutions to train predictive models collaboratively without sharing raw data, addressing privacy and regulatory constraints. In the standard horizontal setting, clients hold disjoint cohorts of individuals and collaborate to learn a shared predictor. Most existing methods, however, assume that all clients measure the same features. We study the more realistic setting of covariate mismatch, where each client observes a different subset of features, which typically arises in multicenter collaborations with no prior agreement on data collection. We formalize learning a linear prediction under client-wise MCAR patterns and develop two modular approaches tailored to the dimensional regime and communication budget. In the low-dimensional setting, we propose a plug-in estimator that approximates the oracle linear predictor by aggregating sufficient statistics to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Advanced Causal Inference Techniques