What is the Best Feature Learning Procedure in Hierarchical Recognition Architectures?
Kevin Jarrett, Koray Kvukcuoglu, Karol Gregor, Yann LeCun

TL;DR
This paper empirically compares different feature learning procedures for hierarchical recognition architectures, introducing a new supervised method that achieves high recognition rates without pre-training.
Contribution
It introduces a new single-phase supervised learning procedure with L1 penalty and an augmented DPSD method, improving performance over traditional two-phase training.
Findings
DPSD with lateral inhibition and multi-scale pooling achieves 70.6% on Caltech-101.
Single-phase supervised learning with L1 penalty achieves 77% on CIFAR-10.
Supervised training can outperform unsupervised pre-training in hierarchical recognition models.
Abstract
(This paper was written in November 2011 and never published. It is posted on arXiv.org in its original form in June 2016). Many recent object recognition systems have proposed using a two phase training procedure to learn sparse convolutional feature hierarchies: unsupervised pre-training followed by supervised fine-tuning. Recent results suggest that these methods provide little improvement over purely supervised systems when the appropriate nonlinearities are included. This paper presents an empirical exploration of the space of learning procedures for sparse convolutional networks to assess which method produces the best performance. In our study, we introduce an augmentation of the Predictive Sparse Decomposition method that includes a discriminative term (DPSD). We also introduce a new single phase supervised learning procedure that places an L1 penalty on the output state of each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques
