Convergence of backpropagation with momentum for network architectures with skip connections
Chirag Agarwal, Joe Klobusicky, and Dan Schonfeld

TL;DR
This paper proves convergence of backpropagation with momentum in deep DAG neural networks and demonstrates the effectiveness of such architectures through an autoencoder example.
Contribution
It extends convergence results to deep DAG architectures with skip connections, generalizing previous work on shallow networks.
Findings
Weights converge for a large class of nonlinear activations
DAG architectures outperform sequential networks in compression tasks
Autoencoders with skip connections are effective for data compression
Abstract
We study a class of deep neural networks with networks that form a directed acyclic graph (DAG). For backpropagation defined by gradient descent with adaptive momentum, we show weights converge for a large class of nonlinear activation functions. The proof generalizes the results of Wu et al. (2008) who showed convergence for a feed forward network with one hidden layer. For an example of the effectiveness of DAG architectures, we describe an example of compression through an autoencoder, and compare against sequential feed forward networks under several metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
