Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing

Max Lovig

arXiv:2604.27883·math.ST·May 1, 2026

Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing

Max Lovig

PDF

TL;DR

Decoupled Descent (DD) is a new training algorithm that uses approximate message passing to accurately track test error during training, reducing bias and improving generalization in certain models.

Contribution

We introduce Decoupled Descent, a theory-based method that enforces train-test error identity and cancels data reuse bias, enabling zero-cost validation and better generalization.

Findings

01

DD outperforms gradient descent on XOR classification.

02

DD narrows the generalization gap on noisy MNIST.

03

DD's dynamics are governed by a low-dimensional state evolution.

Abstract

In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact realization of training data; this drives the systematic ``generalization gap'', where the train error becomes an unreliable proxy for test error. Existing approaches either argue this gap is benign through complex analysis or sacrifice data to a validation set. In contrast, we introduce decoupled descent (DD), a novel theory-based training algorithm that satisfies a train-test identity -- enforcing the train error to asymptotically track the test error for stylized Gaussian mixture models. Within this specific regime, leveraging approximate message passing theory, DD iteratively cancels the biases due to data reuse, rigorously demonstrating the feasibility of zero-cost validation and $100%$ data utilization. Moreover, DD is governed by a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.