PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition
Dongyun Lin, Yi Cheng, Shangbo Mao, Aiyuan Guo, Yiqun Li

TL;DR
PEVA-Net leverages CLIP and prompt-based strategies to improve zero/few-shot 3D shape recognition from multi-view images, combining view aggregation with self-distillation for enhanced performance.
Contribution
The paper introduces PEVA-Net, a novel network that integrates prompt-enhanced view aggregation and self-distillation to address zero/few-shot 3D shape recognition.
Findings
Effective zero-shot recognition using category prompts.
Significant improvement in few-shot learning via self-distillation.
Unified approach for zero/few-shot 3D shape recognition.
Abstract
Large vision-language models have impressively promote the performance of 2D visual recognition under zero/few-shot scenarios. In this paper, we focus on exploiting the large vision-language model, i.e., CLIP, to address zero/few-shot 3D shape recognition based on multi-view representations. The key challenge for both tasks is to generate a discriminative descriptor of the 3D shape represented by multiple view images under the scenarios of either without explicit training (zero-shot 3D shape recognition) or training with a limited number of data (few-shot 3D shape recognition). We analyze that both tasks are relevant and can be considered simultaneously. Specifically, leveraging the descriptor which is effective for zero-shot inference to guide the tuning of the aggregated descriptor under the few-shot training can significantly improve the few-shot learning efficacy. Hence, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Medical Imaging and Analysis · Industrial Vision Systems and Defect Detection
MethodsContrastive Language-Image Pre-training · Focus
