Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point Clouds
Shaofeng Zhang, Xuanqi Chen, Xiangdong Zhang, Sitong Wu, Junchi Yan

TL;DR
This paper introduces a dual-branch contrastive learning framework for 3D point clouds that captures rich geometric features and improves downstream task performance, surpassing existing methods in several benchmarks.
Contribution
The paper proposes a novel dual-branch center-surround contrastive framework with patch-level contrastive loss for better 3D feature learning, outperforming prior generative and contrastive methods.
Findings
Achieves state-of-the-art results on multiple protocols.
Outperforms baseline Point-MAE by significant margins.
Effectively captures local and high-level geometric features.
Abstract
Most existing self-supervised learning (SSL) approaches for 3D point clouds are dominated by generative methods based on Masked Autoencoders (MAE). However, these generative methods have been proven to struggle to capture high-level discriminative features effectively, leading to poor performance on linear probing and other downstream tasks. In contrast, contrastive methods excel in discriminative feature representation and generalization ability on image data. Despite this, contrastive learning (CL) in 3D data remains scarce. Besides, simply applying CL methods designed for 2D data to 3D fails to effectively learn 3D local details. To address these challenges, we propose a novel Dual-Branch \textbf{C}enter-\textbf{S}urrounding \textbf{Con}trast (CSCon) framework. Specifically, we apply masking to the center and surrounding parts separately, constructing dual-branch inputs with…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The proposed methods combines contrastive learning and MAE-style pre-training, realizing fine-grained patch-level reasoning. The method is technically sound and conceptually interesting. 2. The experiment results show relatively strong performance on various benchmarks.
1. The method is only limited to object-level datasets. A major difference between contrastive learning and MAE-style pre-training is that contrastive learning can be better applied to more complex scene-level scenarios. Since the authors categorize their method to contrastive-based method, the missing experiments of pre-training directly on scene-level datasets like ScanNet would largely undermine the strength of the paper. 2. It would be better if the author could analyze more thoroughly int
1, The paper correctly identifies the over-reliance of current 3D SSL on MAE-style reconstruction losses that learn low-level geometry but weak semantics. Addressing this via a contrastive objective that leverages 3D spatial structure is a meaningful idea. 2, Removing decoders and multi-view generation reduces computation and eases implementation. 3, This method achieves state-of-the-art performance.
1, Incremental conceptual novelty. The “center-surrounding” idea is intuitively similar to spatial partitioning already used in hybrid or region-aware methods (e.g., PointContrast’s local views, ReCon’s cross-patch contrast, Point-CMAE’s implicit local/global separation). The core innovation reduces largely to choosing intra-sample positives differently. Without a new theoretical insight or broader unification, the contribution is modest. 2, CSCon is seems like DetCo [1] in 3D, local/global co
1. Clear Expression: The paper is well-written, and the methodology is presented in a clear and logical manner. 2. Easy to Follow: The paper is well-structured and easy to follow. The authors clearly explain the methodology, experimental setup, and results, ensuring that readers can easily understand the core concepts and contributions. 3. Reasonable Complexity: The proposed method maintains reasonable complexity while achieving substantial improvements in performance. 4. Solid Experimental
1. Innovation Depth: The proposed innovation, where the encoded results remain consistent across different masking strategies for already partitioned patches, is relatively simple and straightforward. While it proves effective in improving performance, the novelty feels incremental when compared to the broader scope of 3D point cloud representation learning. 2. Comparative Analysis: The authors predominantly compare their method with older works, which highlights the strengths of their approac
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks
