Convergence of gradient flow for learning convolutional neural networks

Jona-Maria Diederen; Holger Rauhut; Ulrich Terstiege

arXiv:2601.08547·math.OC·January 14, 2026

Convergence of gradient flow for learning convolutional neural networks

Jona-Maria Diederen, Holger Rauhut, Ulrich Terstiege

PDF

Open Access

TL;DR

This paper analyzes the convergence behavior of gradient flow in linear convolutional neural networks, showing it always reaches a critical point under mild data conditions, providing insights into optimization dynamics.

Contribution

It offers a theoretical analysis of gradient flow convergence in simplified linear CNNs, a step towards understanding training dynamics of more complex models.

Findings

01

Gradient flow converges to a critical point in linear CNNs.

02

Convergence holds under mild conditions on training data.

03

Provides a foundation for analyzing non-convex optimization in CNNs.

Abstract

Convolutional neural networks are widely used in imaging and image recognition. Learning such networks from training data leads to the minimization of a non-convex function. This makes the analysis of standard optimization methods such as variants of (stochastic) gradient descent challenging. In this article we study the simplified setting of linear convolutional networks. We show that the gradient flow (to be interpreted as an abstraction of gradient descent) applied to the empirical risk defined via certain loss functions including the square loss always converges to a critical point, under a mild condition on the training data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference