On Pre-Trained Image Features and Synthetic Images for Deep Learning
Stefan Hinterstoisser, Vincent Lepetit, Paul Wohlhart, Kurt Konolige

TL;DR
This paper demonstrates that freezing pre-trained feature extraction layers and training only the remaining layers with synthetic images can effectively train modern object detectors, reducing reliance on real data.
Contribution
The authors show that a simple approach of freezing pre-trained feature extractors and training only the remaining layers with synthetic images is sufficient for effective object detection training.
Findings
Method performs well across recent deep architectures.
Synthetic images with frozen features can replace real data.
Approach simplifies training pipeline for object detectors.
Abstract
Deep Learning methods usually require huge amounts of training data to perform at their full potential, and often require expensive manual labeling. Using synthetic images is therefore very attractive to train object detectors, as the labeling comes for free, and several approaches have been proposed to combine synthetic and real images for training. In this paper, we show that a simple trick is sufficient to train very effectively modern object detectors with synthetic images only: We freeze the layers responsible for feature extraction to generic layers pre-trained on real images, and train only the remaining layers with plain OpenGL rendering. Our experiments with very recent deep architectures for object recognition (Faster-RCNN, R-FCN, Mask-RCNN) and image feature extractors (InceptionResnet and Resnet) show this simple approach performs surprisingly well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Position-Sensitive RoI Pooling · Region-based Fully Convolutional Network
