Gradient flow in parameter space is equivalent to linear interpolation in output space

Thomas Chen; Patr\'icia Mu\~noz Ewald

arXiv:2408.01517·cs.LG·February 2, 2026

Gradient flow in parameter space is equivalent to linear interpolation in output space

Thomas Chen, Patr\'icia Mu\~noz Ewald

PDF

TL;DR

This paper demonstrates that gradient flow in parameter space can be transformed into linear interpolation in output space, providing insights into convergence and global minima in deep learning training.

Contribution

It establishes a theoretical equivalence between parameter space gradient flow and output space linear interpolation, with explicit formulas for global minima under certain conditions.

Findings

01

Gradient flow in parameter space can be deformed into output space linear interpolation.

02

For L2 loss, full rank Jacobian allows reparametrization to linear interpolation and global minimum.

03

Explicit formula for the global minimum is derived for cross-entropy loss with positive labels.

Abstract

We prove that the standard gradient flow in parameter space that underlies many training algorithms in deep learning can be continuously deformed into an adapted gradient flow which yields (constrained) Euclidean gradient flow in output space. Moreover, for the $L^{2}$ loss, if the Jacobian of the outputs with respect to the parameters is full rank (for fixed training data), then the time variable can be reparametrized so that the resulting flow is simply linear interpolation, and a global minimum can be achieved. For the cross-entropy loss, under the same rank condition and assuming the labels have positive components, we derive an explicit formula for the unique global minimum.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.