Stein-Rule Shrinkage for Stochastic Gradient Estimation in High Dimensions

M. Arashi; M. Amintoosi

arXiv:2602.01777·cs.LG·February 10, 2026

Stein-Rule Shrinkage for Stochastic Gradient Estimation in High Dimensions

M. Arashi, M. Amintoosi

PDF

Open Access

TL;DR

This paper introduces a Stein-rule shrinkage framework for stochastic gradient estimation in high-dimensional deep learning, leading to an improved optimizer called SR-Adam that outperforms standard Adam in noisy, large-batch settings.

Contribution

It develops a novel high-dimensional shrinkage estimator for stochastic gradients, integrated into Adam, with theoretical optimality and practical improvements demonstrated on image classification tasks.

Findings

01

SR-Adam outperforms Adam in large-batch regimes.

02

Shrinkage applied to convolutional layers yields most gains.

03

The method is minimax-optimal under Gaussian noise assumptions.

Abstract

Stochastic gradient methods are central to large-scale learning, but they treat mini-batch gradients as unbiased estimators, which classical decision theory shows are inadmissible in high dimensions. We formulate gradient computation as a high-dimensional estimation problem and introduce a framework based on Stein-rule shrinkage. We construct a gradient estimator that adaptively contracts noisy mini-batch gradients toward a stable estimator derived from historical momentum. The shrinkage intensity is determined in a data-driven manner using an online estimate of gradient noise variance, leveraging statistics from adaptive optimizers. Under a Gaussian noise model, we show our estimator uniformly dominates the standard stochastic gradient under squared error loss and is minimax-optimal. We incorporate this into the Adam optimizer, yielding SR-Adam, a practical algorithm with negligible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Gaussian Processes and Bayesian Inference