Positional Prompt Tuning for Efficient 3D Representation Learning

Shaochen Zhang; Zekun Qi; Runpei Dong; Xiuxiu Bai; Xing Wei

arXiv:2408.11567·cs.CV·September 25, 2025

Positional Prompt Tuning for Efficient 3D Representation Learning

Shaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei

PDF

Open Access 1 Repo

TL;DR

This paper introduces PPT, a parameter-efficient fine-tuning method for 3D point cloud analysis that leverages positional prompts and achieves state-of-the-art results with minimal trainable parameters.

Contribution

The paper proposes PPT, a novel PEFT approach for 3D representation learning that uses increased patch tokens and trainable positional encoding, while keeping most pre-trained model parameters frozen.

Findings

01

PPT achieves 95.01% accuracy on ScanObjectNN OBJ_BG dataset.

02

PPT requires only 1.05 million trainable parameters.

03

Extensive experiments validate PPT's effectiveness and efficiency.

Abstract

We rethink the role of positional encoding in 3D representation learning and fine-tuning. We argue that using positional encoding in point Transformer-based methods serves to aggregate multi-scale features of point clouds. Additionally, we explore parameter-efficient fine-tuning (PEFT) through the lens of prompts and adapters, introducing a straightforward yet effective method called PPT for point cloud analysis. PPT incorporates increased patch tokens and trainable positional encoding while keeping most pre-trained model parameters frozen. Extensive experiments validate that PPT is both effective and efficient. Our proposed method of PEFT tasks, namely PPT, with only 1.05M of parameters for training, gets state-of-the-art results in several mainstream datasets, such as 95.01% accuracy in the ScanObjectNN OBJ_BG dataset. Codes and weights will be released at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zsc000722/ppt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques

MethodsLinear Layer · Residual Connection · Multi-Head Attention · Adam · Layer Normalization · Attention Is All You Need · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Absolute Position Encodings