Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D   Object Sets

Kristofer Schlachter; Benjamin Ahlbrand; Zhu Wang; Valerio Ortenzi,; Ken Perlin

arXiv:2209.00682·cs.CV·September 5, 2022

Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets

Kristofer Schlachter, Benjamin Ahlbrand, Zhu Wang, Valerio Ortenzi,, Ken Perlin

PDF

Open Access

TL;DR

This paper introduces a zero-shot multi-modal retrieval system for 3D assets that leverages CLIP embeddings to enable artist-controlled exploration using sketches, images, and text, improving artistic control and flexibility.

Contribution

The work presents a novel multi-modal retrieval framework that combines different input features for 3D asset search, addressing the lack of artistic control in existing data-driven methods.

Findings

01

Effective multi-modal fusion improves retrieval accuracy.

02

Flexible combination of input features enhances artistic control.

03

System supports zero-shot retrieval without task-specific training.

Abstract

When creating 3D content, highly specialized skills are generally needed to design and generate models of objects and other assets by hand. We address this problem through high-quality 3D asset retrieval from multi-modal inputs, including 2D sketches, images and text. We use CLIP as it provides a bridge to higher-level latent features. We use these features to perform a multi-modality fusion to address the lack of artistic control that affects common data-driven approaches. Our approach allows for multi-modal conditional feature-driven retrieval through a 3D asset database, by utilizing a combination of input latent embeddings. We explore the effects of different combinations of feature embeddings across different input types and weighting methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · 3D Surveying and Cultural Heritage · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training