Asymmetric Dual Self-Distillation for 3D Self-Supervised Representation Learning
Remco F. Leijenaar, Hamidreza Kasaei

TL;DR
This paper introduces AsymDSD, a novel self-supervised learning framework for 3D point clouds that combines masked modeling and invariance learning, leading to state-of-the-art results on benchmark datasets.
Contribution
It proposes an asymmetric dual self-distillation approach that unifies masked modeling and invariance learning in 3D representation learning.
Findings
Achieves 90.53% on ScanObjectNN
Pretraining improves accuracy to 93.72%
Outperforms prior methods in 3D self-supervised learning
Abstract
Learning semantically meaningful representations from unstructured 3D point clouds remains a central challenge in computer vision, especially in the absence of large-scale labeled datasets. While masked point modeling (MPM) is widely used in self-supervised 3D learning, its reconstruction-based objective can limit its ability to capture high-level semantics. We propose AsymDSD, an Asymmetric Dual Self-Distillation framework that unifies masked modeling and invariance learning through prediction in the latent space rather than the input space. AsymDSD builds on a joint embedding architecture and introduces several key design choices: an efficient asymmetric setup, disabling attention between masked queries to prevent shape leakage, multi-mask sampling, and a point cloud adaptation of multi-crop. AsymDSD achieves state-of-the-art results on ScanObjectNN (90.53%) and further improves to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Robot Manipulation and Learning · Robotics and Sensor-Based Localization
