Unsupervised Pretraining Encourages Moderate-Sparseness
Jun Li, Wei Luo, Jian Yang, Xiaotong Yuan

TL;DR
This paper explains that unsupervised pretraining improves neural network performance by inducing moderate sparsity in hidden unit activations, acting as an adaptive sparse coding mechanism, as supported by experiments on MNIST and Birdsong.
Contribution
It reveals that pretraining encourages sparsity in neural networks, providing a new understanding of its effectiveness beyond regularization and optimization.
Findings
Pretraining leads to moderate sparsity in hidden units.
Pretrained models can be viewed as adaptive sparse coders.
Experimental results support the sparseness hypothesis on MNIST and Birdsong.
Abstract
It is well known that direct training of deep neural networks will generally lead to poor results. A major progress in recent years is the invention of various pretraining methods to initialize network parameters and it was shown that such methods lead to good prediction performance. However, the reason for the success of pretraining has not been fully understood, although it was argued that regularization and better optimization play certain roles. This paper provides another explanation for the effectiveness of pretraining, where we show pretraining leads to a sparseness of hidden unit activation in the resulting neural networks. The main reason is that the pretraining models can be interpreted as an adaptive sparse coding. Compared to deep neural network with sigmoid function, our experimental results on MNIST and Birdsong further support this sparseness observation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification · Gaussian Processes and Bayesian Inference
