PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot   Multi-View 3D Shape Recognition

Dongyun Lin; Yi Cheng; Shangbo Mao; Aiyuan Guo; Yiqun Li

arXiv:2404.19168·cs.CV·May 1, 2024

PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition

Dongyun Lin, Yi Cheng, Shangbo Mao, Aiyuan Guo, Yiqun Li

PDF

Open Access

TL;DR

PEVA-Net leverages CLIP and prompt-based strategies to improve zero/few-shot 3D shape recognition from multi-view images, combining view aggregation with self-distillation for enhanced performance.

Contribution

The paper introduces PEVA-Net, a novel network that integrates prompt-enhanced view aggregation and self-distillation to address zero/few-shot 3D shape recognition.

Findings

01

Effective zero-shot recognition using category prompts.

02

Significant improvement in few-shot learning via self-distillation.

03

Unified approach for zero/few-shot 3D shape recognition.

Abstract

Large vision-language models have impressively promote the performance of 2D visual recognition under zero/few-shot scenarios. In this paper, we focus on exploiting the large vision-language model, i.e., CLIP, to address zero/few-shot 3D shape recognition based on multi-view representations. The key challenge for both tasks is to generate a discriminative descriptor of the 3D shape represented by multiple view images under the scenarios of either without explicit training (zero-shot 3D shape recognition) or training with a limited number of data (few-shot 3D shape recognition). We analyze that both tasks are relevant and can be considered simultaneously. Specifically, leveraging the descriptor which is effective for zero-shot inference to guide the tuning of the aggregated descriptor under the few-shot training can significantly improve the few-shot learning efficacy. Hence, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Medical Imaging and Analysis · Industrial Vision Systems and Defect Detection

MethodsContrastive Language-Image Pre-training · Focus