Learning to Recognize Objects by Retaining other Factors of Variation

Jiaping Zhao; Chin-kai Chang; Laurent Itti

arXiv:1607.05851·cs.CV·January 24, 2017·1 cites

Learning to Recognize Objects by Retaining other Factors of Variation

Jiaping Zhao, Chin-kai Chang, Laurent Itti

PDF

Open Access

TL;DR

This paper introduces a multi-task ConvNet called disCNN that explicitly learns to disentangle object identity and pose, leading to improved recognition accuracy and better generalization across datasets.

Contribution

The work presents a novel multi-task learning approach that explicitly models and learns disentangled representations of object identity and pose in ConvNets.

Findings

01

disCNN outperforms AlexNet in object recognition accuracy on iLab-20M

02

Pretrained disCNN features generalize better to other datasets like Washington RGB-D and ImageNet

03

Disentangled representations improve fine-tuning performance on large-scale datasets

Abstract

Natural images are generated under many factors, including shape, pose, illumination etc. Most existing ConvNets formulate object recognition from natural images as a single task classification problem, and attempt to learn features useful for object categories, but invariant to other factors of variation as much as possible. These architectures do not explicitly learn other factors, like pose and lighting, instead, they usually discard them by pooling and normalization. In this work, we take the opposite approach: we train ConvNets for object recognition by retaining other factors (pose in our case) and learn them jointly with object category. We design a new multi-task leaning (MTL) ConvNet, named disentangling CNN (disCNN), which explicitly enforces the disentangled representations of object identity and pose, and is trained to predict object categories and pose transformations. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/