Convergence of gradient flow for learning convolutional neural networks
Jona-Maria Diederen, Holger Rauhut, Ulrich Terstiege

TL;DR
This paper analyzes the convergence behavior of gradient flow in linear convolutional neural networks, showing it always reaches a critical point under mild data conditions, providing insights into optimization dynamics.
Contribution
It offers a theoretical analysis of gradient flow convergence in simplified linear CNNs, a step towards understanding training dynamics of more complex models.
Findings
Gradient flow converges to a critical point in linear CNNs.
Convergence holds under mild conditions on training data.
Provides a foundation for analyzing non-convex optimization in CNNs.
Abstract
Convolutional neural networks are widely used in imaging and image recognition. Learning such networks from training data leads to the minimization of a non-convex function. This makes the analysis of standard optimization methods such as variants of (stochastic) gradient descent challenging. In this article we study the simplified setting of linear convolutional networks. We show that the gradient flow (to be interpreted as an abstraction of gradient descent) applied to the empirical risk defined via certain loss functions including the square loss always converges to a critical point, under a mild condition on the training data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference
