Align then Adapt: Rethinking Parameter-Efficient Transfer Learning in 4D Perception
Yiding Sun, Jihua Zhu, Haozhe Cheng, Chaoyi Lu, Zhichuan Yang, Lin Chen, Yaonan Wang

TL;DR
This paper introduces PointATA, a two-stage transfer learning method that adapts 3D models for 4D perception tasks, addressing overfitting and modality gaps with minimal parameters.
Contribution
It proposes a novel 'Align then Adapt' paradigm using optimal-transport theory and specialized adapters to improve 4D perception transfer from 3D models.
Findings
Achieves 97.21% accuracy on 3D action recognition.
Improves 4D action segmentation by +8.7%.
Attains 84.06% on 4D semantic segmentation.
Abstract
Point cloud video understanding is critical for robotics as it accurately encodes motion and scene interaction. We recognize that 4D datasets are far scarcer than 3D ones, which hampers the scalability of self-supervised 4D models. A promising alternative is to transfer 3D pre-trained models to 4D perception tasks. However, rigorous empirical analysis reveals two critical limitations that impede transfer capability: overfitting and the modality gap. To overcome these challenges, we develop a novel "Align then Adapt" (PointATA) paradigm that decomposes parameter-efficient transfer learning into two sequential stages. Optimal-transport theory is employed to quantify the distributional discrepancy between 3D and 4D datasets, enabling our proposed point align embedder to be trained in Stage 1 to alleviate the underlying modality gap. To mitigate overfitting, an efficient point-video adapter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
