Fast Compute for ML Optimization

Nick Polson; Vadim Sokolov

arXiv:2602.14280·stat.CO·February 17, 2026

Fast Compute for ML Optimization

Nick Polson, Vadim Sokolov

PDF

Open Access

TL;DR

This paper introduces the Scale Mixture EM (SM-EM) algorithm for efficient optimization in machine learning, which removes the need for manual tuning and outperforms Adam in convergence speed and loss reduction on synthetic benchmarks.

Contribution

The paper proposes a novel EM-based optimization algorithm that automatically adapts learning rates and momentum, improving convergence and efficiency over traditional methods like Adam.

Findings

01

SM-EM with Nesterov acceleration achieves up to 13x lower loss than Adam.

02

Sharing statistics across penalty values reduces runtime by 10x.

03

EM guarantees nonincreasing objectives; acceleration improves convergence speed.

Abstract

We study optimization for losses that admit a variance-mean scale-mixture representation. Under this representation, each EM iteration is a weighted least squares update in which latent variables determine observation and parameter weights; these play roles analogous to Adam's second-moment scaling and AdamW's weight decay, but are derived from the model. The resulting Scale Mixture EM (SM-EM) algorithm removes user-specified learning-rate and momentum schedules. On synthetic ill-conditioned logistic regression benchmarks with $p \in {20, \dots, 500}$ , SM-EM with Nesterov acceleration attains up to $13 \times$ lower final loss than Adam tuned by learning-rate grid search. For a 40-point regularization path, sharing sufficient statistics across penalty values yields a $10 \times$ runtime reduction relative to the same tuned-Adam protocol. For the base (non-accelerated) algorithm, EM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Statistical Methods and Inference