A Mathematical Principle of Deep Learning: Learn the Geodesic Curve in the Wasserstein Space
Kuo Gai, Shihua Zhang

TL;DR
This paper establishes a mathematical principle for deep learning, showing that neural networks learn geodesic curves in Wasserstein space, which explains their optimization and generalization capabilities, especially highlighting ResNet's advantages.
Contribution
The paper introduces a novel connection between deep neural networks and the geodesic curves in Wasserstein space, providing a theoretical foundation for understanding DNN optimization and generalization.
Findings
ResNet better approximates the geodesic curve in Wasserstein space.
ResNet's learned transport map is closer to the optimal transport map.
Numerical experiments confirm the theoretical insights about DNNs learning geodesic curves.
Abstract
Recent studies revealed the mathematical connection of deep neural network (DNN) and dynamic system. However, the fundamental principle of DNN has not been fully characterized with dynamic system in terms of optimization and generalization. To this end, we build the connection of DNN and continuity equation where the measure is conserved to model the forward propagation process of DNN which has not been addressed before. DNN learns the transformation of the input distribution to the output one. However, in the measure space, there are infinite curves connecting two distributions. Which one can lead to good optimization and generaliztion for DNN? By diving the optimal transport theory, we find DNN with weight decay attempts to learn the geodesic curve in the Wasserstein space, which is induced by the optimal transport map. Compared with plain network, ResNet is a better approximation to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Hydraulic Fracturing and Reservoir Analysis · Seismic Imaging and Inversion Techniques
MethodsResidual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Bottleneck Residual Block · Residual Block · Average Pooling · Max Pooling · Kaiming Initialization · Convolution
