Convergence of gradient based pre-training in Denoising autoencoders
Vamsi K Ithapu, Sathya Ravi, Vikas Singh

TL;DR
This paper analyzes the convergence properties of gradient-based pre-training in denoising autoencoders, providing theoretical guarantees and empirical validation for both classical and distributed settings.
Contribution
It offers the first theoretical analysis of convergence rates for autoencoder pre-training and demonstrates improved convergence in distributed training.
Findings
Gradient converges at rate 1/√N
Convergence depends sub-linearly on network size
Distributed pre-training improves convergence by τ^{3/4}
Abstract
The success of deep architectures is at least in part attributed to the layer-by-layer unsupervised pre-training that initializes the network. Various papers have reported extensive empirical analysis focusing on the design and implementation of good pre-training procedures. However, an understanding pertaining to the consistency of parameter estimates, the convergence of learning procedures and the sample size estimates is still unavailable in the literature. In this work, we study pre-training in classical and distributed denoising autoencoders with these goals in mind. We show that the gradient converges at the rate of and has a sub-linear dependence on the size of the autoencoder network. In a distributed setting where disjoint sections of the whole network are pre-trained synchronously, we show that the convergence improves by at least , where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image and Signal Denoising Methods · Music and Audio Processing
MethodsSolana Customer Service Number +1-833-534-1729
