Learning to Recognize Objects by Retaining other Factors of Variation
Jiaping Zhao, Chin-kai Chang, Laurent Itti

TL;DR
This paper introduces a multi-task ConvNet called disCNN that explicitly learns to disentangle object identity and pose, leading to improved recognition accuracy and better generalization across datasets.
Contribution
The work presents a novel multi-task learning approach that explicitly models and learns disentangled representations of object identity and pose in ConvNets.
Findings
disCNN outperforms AlexNet in object recognition accuracy on iLab-20M
Pretrained disCNN features generalize better to other datasets like Washington RGB-D and ImageNet
Disentangled representations improve fine-tuning performance on large-scale datasets
Abstract
Natural images are generated under many factors, including shape, pose, illumination etc. Most existing ConvNets formulate object recognition from natural images as a single task classification problem, and attempt to learn features useful for object categories, but invariant to other factors of variation as much as possible. These architectures do not explicitly learn other factors, like pose and lighting, instead, they usually discard them by pooling and normalization. In this work, we take the opposite approach: we train ConvNets for object recognition by retaining other factors (pose in our case) and learn them jointly with object category. We design a new multi-task leaning (MTL) ConvNet, named disentangling CNN (disCNN), which explicitly enforces the disentangled representations of object identity and pose, and is trained to predict object categories and pose transformations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/
