Learning Canonical 3D Object Representation for Fine-Grained Recognition
Sunghun Joung, Seungryong Kim, Minsu Kim, Ig-Jae Kim, Kwanghoon Sohn

TL;DR
This paper introduces a framework for fine-grained 3D object recognition from single images, learning shape and appearance variations without ground-truth 3D data, and improving recognition and reconstruction performance.
Contribution
It presents a novel 3D-aware representation learning method that jointly models shape and appearance, enabling viewpoint-invariant recognition without 3D annotations.
Findings
Achieves competitive fine-grained recognition accuracy.
Improves 3D shape reconstruction quality.
Enhances shape deformation learning through boosting.
Abstract
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image, trained on an image collection without using any ground-truth 3D annotation. We accomplish this by representing an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint, in a canonical configuration. Unlike conventional methods modeling spatial variation in 2D images only, our method is capable of reconfiguring the appearance feature in a canonical 3D space, thus enabling the subsequent object classifier to be invariant under 3D geometric variation. Our representation also allows us to go beyond existing methods, by incorporating 3D shape variation as an additional cue for object recognition. To learn the model without ground-truth 3D annotation, we deploy a differentiable renderer in an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
