P3T: Prototypical Point-level Prompt Tuning with Enhanced Generalization for 3D Vision-Language Models

Geunyoung Jung; Soohong Kim; Kyungwoo Song; Jiyoung Jung

arXiv:2604.15703·cs.CV·April 20, 2026

P3T: Prototypical Point-level Prompt Tuning with Enhanced Generalization for 3D Vision-Language Models

Geunyoung Jung, Soohong Kim, Kyungwoo Song, Jiyoung Jung

PDF

1 Repo

TL;DR

P3T is a parameter-efficient prompt tuning method for 3D vision-language models that enhances generalization and reduces overfitting by using instance-aware prompts and a prototypical loss.

Contribution

It introduces a novel prompt tuning approach with point and text prompters, improving adaptation and generalization of 3D VLMs without full fine-tuning.

Findings

01

Matches or outperforms full fine-tuning in classification tasks.

02

Shows robustness under data shift in cross-dataset experiments.

03

Effective in few-shot learning scenarios.

Abstract

With the rise of pre-trained models in the 3D point cloud domain for a wide range of real-world applications, adapting them to downstream tasks has become increasingly important. However, conventional full fine-tuning methods are computationally expensive and storage-intensive. Although prompt tuning has emerged as an efficient alternative, it often suffers from overfitting, thereby compromising generalization capability. To address this issue, we propose Prototypical Point-level Prompt Tuning (P $^{3}$ T), a parameter-efficient prompt tuning method designed for pre-trained 3D vision-language models (VLMs). P $^{3}$ T consists of two components: 1) \textit{Point Prompter}, which generates instance-aware point-level prompts for the input point cloud, and 2) \textit{Text Prompter}, which employs learnable prompts into the input text instead of hand-crafted ones. Since both prompters operate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gyjung975/P3T
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.