Stochastic Gradients under Nuisances

Facheng Yu; Ronak Mehta; Alex Luedtke; Zaid Harchaoui

arXiv:2508.20326·stat.ML·August 29, 2025

Stochastic Gradients under Nuisances

Facheng Yu, Ronak Mehta, Alex Luedtke, Zaid Harchaoui

PDF

1 Video

TL;DR

This paper analyzes stochastic gradient algorithms for learning problems with unknown nuisance parameters, providing convergence guarantees and showing that orthogonalization techniques can ensure reliable optimization.

Contribution

It establishes non-asymptotic convergence results for stochastic gradients in the presence of nuisances and introduces variants with approximate orthogonalization for broader applicability.

Findings

01

Classical stochastic gradient methods can converge despite nuisances under Neyman orthogonality.

02

Orthogonalized update algorithms achieve similar convergence rates without strict orthogonality.

03

Applications include orthogonal statistical learning, double machine learning, and causal inference.

Abstract

Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while the presence of a nuisance can alter the optimum and upset the optimization trajectory, the classical stochastic gradient algorithm may still converge under appropriate conditions, such as Neyman orthogonality. Moreover, even when Neyman orthogonality is not satisfied, we show that an algorithm variant with approximately orthogonalized updates (with an approximately orthogonalized gradient oracle) may achieve similar convergence rates. Examples from orthogonal statistical learning/double machine learning and causal inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Stochastic Gradients under Nuisances· slideslive