Prompt-guided Scene Generation for 3D Zero-Shot Learning

Majid Nasiri; Ali Cheraghian; Townim Faisal Chowdhury; Sahar Ahmadi,; Morteza Saberi; Shafin Rahman

arXiv:2209.14690·cs.CV·September 30, 2022

Prompt-guided Scene Generation for 3D Zero-Shot Learning

Majid Nasiri, Ali Cheraghian, Townim Faisal Chowdhury, Sahar Ahmadi,, Morteza Saberi, Shafin Rahman

PDF

Open Access

TL;DR

This paper introduces a prompt-guided scene generation approach for 3D zero-shot learning, leveraging scene augmentation and contrastive learning to enhance recognition of unseen objects in 3D point cloud data.

Contribution

It proposes a novel prompt-guided scene generation method that improves 3D zero-shot learning by augmenting data and utilizing scene context, outperforming existing methods.

Findings

01

Achieved state-of-the-art ZSL performance on ModelNet40 and ModelNet10 datasets.

02

Demonstrated improved generalized ZSL results on synthetic and real 3D datasets.

03

Utilized contrastive learning with scene augmentation for better object recognition.

Abstract

Zero-shot learning on 3D point cloud data is a related underexplored problem compared to its 2D image counterpart. 3D data brings new challenges for ZSL due to the unavailability of robust pre-trained feature extraction models. To address this problem, we propose a prompt-guided 3D scene generation and supervision method that augments 3D data to learn the network better, exploring the complex interplay of seen and unseen objects. First, we merge point clouds of two 3D models in certain ways described by a prompt. The prompt acts like the annotation describing each 3D scene. Later, we perform contrastive learning to train our proposed architecture in an end-to-end manner. We argue that 3D scenes can relate objects more efficiently than single objects because popular language models (like BERT) can achieve high performance when objects appear in a context. Our proposed prompt-guided scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications

MethodsContrastive Learning