Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing
Max Lovig

TL;DR
Decoupled Descent (DD) is a new training algorithm that uses approximate message passing to accurately track test error during training, reducing bias and improving generalization in certain models.
Contribution
We introduce Decoupled Descent, a theory-based method that enforces train-test error identity and cancels data reuse bias, enabling zero-cost validation and better generalization.
Findings
DD outperforms gradient descent on XOR classification.
DD narrows the generalization gap on noisy MNIST.
DD's dynamics are governed by a low-dimensional state evolution.
Abstract
In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact realization of training data; this drives the systematic ``generalization gap'', where the train error becomes an unreliable proxy for test error. Existing approaches either argue this gap is benign through complex analysis or sacrifice data to a validation set. In contrast, we introduce decoupled descent (DD), a novel theory-based training algorithm that satisfies a train-test identity -- enforcing the train error to asymptotically track the test error for stylized Gaussian mixture models. Within this specific regime, leveraging approximate message passing theory, DD iteratively cancels the biases due to data reuse, rigorously demonstrating the feasibility of zero-cost validation and data utilization. Moreover, DD is governed by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
