Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations

Xiang Gao; Wei Hu; Guo-Jun Qi

arXiv:2103.00787·cs.CV·March 2, 2021·6 cites

Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations

Xiang Gao, Wei Hu, Guo-Jun Qi

PDF

Open Access

TL;DR

This paper introduces a self-supervised learning method for 3D object recognition that leverages multi-view transformations to learn equivariant representations without labeled data, improving classification and retrieval performance.

Contribution

It proposes a novel self-supervised paradigm, MV-TER, that learns 3D transformation equivariant features from multiple views without requiring labels.

Findings

01

Outperforms state-of-the-art view-based methods in 3D classification

02

Demonstrates strong generalization to real-world datasets

03

Effective in 3D object retrieval tasks

Abstract

3D object representation learning is a fundamental challenge in computer vision to infer about the 3D world. Recent advances in deep learning have shown their efficiency in 3D object recognition, among which view-based methods have performed best so far. However, feature learning of multiple views in existing methods is mostly performed in a supervised fashion, which often requires a large amount of data labels with high costs. In contrast, self-supervised learning aims to learn multi-view feature representations without involving labeled data. To this end, we propose a novel self-supervised paradigm to learn Multi-View Transformation Equivariant Representations (MV-TER), exploring the equivariant transformations of a 3D object and its projected multiple views. Specifically, we perform a 3D transformation on a 3D object, and obtain multiple views before and after the transformation via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Robotics and Sensor-Based Localization