Gradient Descent as Implicit EM in Distance-Based Neural Models
Alan Oursland

TL;DR
This paper reveals that gradient descent on distance-based neural models implicitly performs expectation-maximization, unifying various learning regimes and explaining Bayesian behaviors as a direct consequence of the objective's geometry.
Contribution
It provides a direct algebraic derivation showing gradient descent acts as implicit EM in distance-based models, unifying different learning paradigms under a single mechanism.
Findings
Gradient of log-sum-exp objectives equals negative responsibilities.
Gradient descent performs implicit expectation-maximization.
Bayesian structures in transformers are due to objective geometry.
Abstract
Neural networks trained with standard objectives exhibit behaviors characteristic of probabilistic inference: soft clustering, prototype specialization, and Bayesian uncertainty tracking. These phenomena appear across architectures -- in attention mechanisms, classification heads, and energy-based models -- yet existing explanations rely on loose analogies to mixture models or post-hoc architectural interpretation. We provide a direct derivation. For any objective with log-sum-exp structure over distances or energies, the gradient with respect to each distance is exactly the negative posterior responsibility of the corresponding component: . This is an algebraic identity, not an approximation. The immediate consequence is that gradient descent on such objectives performs expectation-maximization implicitly -- responsibilities are not auxiliary variables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference
