Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

Ferdinand Genans (SU; LPSM); Erwan Scornet (SU; LPSM)

arXiv:2605.19641·stat.ML·May 20, 2026

Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

Ferdinand Genans (SU, LPSM), Erwan Scornet (SU, LPSM)

PDF

TL;DR

This paper introduces Richardson-SGD, a novel debiasing technique that deliberately adds missingness to reduce gradient bias in stochastic gradient methods with incomplete data, improving learning accuracy.

Contribution

The paper proposes a simple, model-agnostic Richardson extrapolation-based method to cancel gradient bias caused by missing data in stochastic gradient descent.

Findings

01

Richardson-SGD reduces gradient bias from O(∥p∥) to O(∥p∥²).

02

Empirical results show improved optimization and estimation in generalized linear models.

03

Adding controlled missingness enhances stochastic learning from incomplete data.

Abstract

Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this work, we prove that all parametric models exhibit similar gradient bias for various imputation procedures and characterize exactly the dependence on the missingness ratio vector $p$ , with $O (∥ p ∥)$ as the leading term. We exploit this analysis to propose a simple debiasing procedure for stochastic gradient descent (SGD) with missing values based on Richardson extrapolation, which leverages the exact expression of the gradient bias. The key idea is to \emph{deliberately add missingness}: from an already incomplete observation, we generate a further-thinned version at a higher, controlled missingness level, and combine the two resulting stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.