TL;DR
This paper systematically evaluates deep regression models in vision tasks, revealing that network architecture changes have less impact than data pre-processing, and that general-purpose networks can achieve near state-of-the-art results.
Contribution
First comprehensive analysis of deep regression techniques, including statistical evaluation across multiple vision tasks and insights into the impact of data pre-processing.
Findings
Data pre-processing variability exceeds architecture variability.
General-purpose networks like VGG-16 and ResNet-50 perform near state-of-the-art.
Statistical confidence intervals provided for performance metrics.
Abstract
Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks, such as human pose estimation, did not escape from this trend. There is a large number of deep models, where small changes in the network architecture, or in the data pre-processing, together with the stochastic nature of the optimization procedures, produce notably different results, making extremely difficult to sift methods that significantly outperform others. This situation motivates the current study, in which we perform a systematic evaluation and statistical analysis of vanilla deep regression, i.e. convolutional neural networks with a linear regression top layer. This is the first comprehensive analysis of deep regression techniques. We perform experiments on four vision problems, and report confidence intervals for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Regression
