Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations
Xiang Gao, Wei Hu, Guo-Jun Qi

TL;DR
This paper introduces a self-supervised learning method for 3D object recognition that leverages multi-view transformations to learn equivariant representations without labeled data, improving classification and retrieval performance.
Contribution
It proposes a novel self-supervised paradigm, MV-TER, that learns 3D transformation equivariant features from multiple views without requiring labels.
Findings
Outperforms state-of-the-art view-based methods in 3D classification
Demonstrates strong generalization to real-world datasets
Effective in 3D object retrieval tasks
Abstract
3D object representation learning is a fundamental challenge in computer vision to infer about the 3D world. Recent advances in deep learning have shown their efficiency in 3D object recognition, among which view-based methods have performed best so far. However, feature learning of multiple views in existing methods is mostly performed in a supervised fashion, which often requires a large amount of data labels with high costs. In contrast, self-supervised learning aims to learn multi-view feature representations without involving labeled data. To this end, we propose a novel self-supervised paradigm to learn Multi-View Transformation Equivariant Representations (MV-TER), exploring the equivariant transformations of a 3D object and its projected multiple views. Specifically, we perform a 3D transformation on a 3D object, and obtain multiple views before and after the transformation via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
