Point-SRA: Self-Representation Alignment for 3D Representation Learning

Lintong Wei; Jian Lu; Haozhe Cheng; Jihua Zhu; Kaibing Zhang

arXiv:2601.01746·cs.CV·May 8, 2026

Point-SRA: Self-Representation Alignment for 3D Representation Learning

Lintong Wei, Jian Lu, Haozhe Cheng, Jihua Zhu, Kaibing Zhang

PDF

1 Video

TL;DR

Point-SRA introduces a novel 3D representation learning method that aligns multi-level features through self-distillation and probabilistic modeling, improving performance across various 3D tasks.

Contribution

It proposes a dual self-representation alignment mechanism at both MAE and MeanFlow Transformer levels, addressing limitations of fixed mask ratios and point-wise reconstruction.

Findings

01

Outperforms Point-MAE by 5.37% on ScanObjectNN.

02

Achieves 96.07% mean IoU on intracranial aneurysm segmentation.

03

Surpasses MaskPoint by 5.12% AP@50 in 3D object detection.

Abstract

Masked autoencoders (MAE) have become a dominant paradigm in 3D representation learning, setting new performance benchmarks across various downstream tasks. Existing methods with fixed mask ratio neglect multi-level representational correlations and intrinsic geometric structures, while relying on point-wise reconstruction assumptions that conflict with the diversity of point cloud. To address these issues, we propose a 3D representation learning method, termed Point-SRA, which aligns representations through self-distillation and probabilistic modeling. Specifically, we assign different masking ratios to the MAE to capture complementary geometric and semantic information, while the MeanFlow Transformer (MFT) leverages cross-modal conditional embeddings to enable diverse probabilistic reconstruction. Our analysis further reveals that representations at different time steps in MFT also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Point-SRA: Self-Representation Alignment for 3D Representation Learning· underline