Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data
Ferdinand Genans (SU, LPSM), Erwan Scornet (SU, LPSM)

TL;DR
This paper introduces Richardson-SGD, a novel debiasing technique that deliberately adds missingness to reduce gradient bias in stochastic gradient methods with incomplete data, improving learning accuracy.
Contribution
The paper proposes a simple, model-agnostic Richardson extrapolation-based method to cancel gradient bias caused by missing data in stochastic gradient descent.
Findings
Richardson-SGD reduces gradient bias from O(∥p∥) to O(∥p∥²).
Empirical results show improved optimization and estimation in generalized linear models.
Adding controlled missingness enhances stochastic learning from incomplete data.
Abstract
Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this work, we prove that all parametric models exhibit similar gradient bias for various imputation procedures and characterize exactly the dependence on the missingness ratio vector , with as the leading term. We exploit this analysis to propose a simple debiasing procedure for stochastic gradient descent (SGD) with missing values based on Richardson extrapolation, which leverages the exact expression of the gradient bias. The key idea is to \emph{deliberately add missingness}: from an already incomplete observation, we generate a further-thinned version at a higher, controlled missingness level, and combine the two resulting stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
