CLIP-Mesh: Generating textured meshes from text using pretrained   image-text models

Nasir Mohammad Khalid; Tianhao Xie; Eugene Belilovsky; Tiberiu Popa

arXiv:2203.13333·cs.CV·September 7, 2022

CLIP-Mesh: Generating textured meshes from text using pretrained image-text models

Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, Tiberiu Popa

PDF

2 Repos

TL;DR

CLIP-Mesh enables zero-shot creation of textured 3D models from text prompts by optimizing mesh parameters with a pre-trained CLIP model, without requiring 3D training data.

Contribution

It introduces a novel method for text-driven 3D model generation by directly optimizing mesh and texture parameters using CLIP, bypassing the need for 3D supervision or training generative models.

Findings

01

Produces plausible textured 3D meshes from text prompts.

02

Operates without 3D supervision or training of generative models.

03

Utilizes image augmentations and pretrained priors for optimization constraints.

Abstract

We present a technique for zero-shot generation of a 3D model using only a target text prompt. Without any 3D supervision our method deforms the control shape of a limit subdivided surface along with its texture map and normal map to obtain a 3D asset that corresponds to the input text prompt and can be easily deployed into games or modeling applications. We rely only on a pre-trained CLIP model that compares the input text prompt with differentiably rendered images of our 3D model. While previous works have focused on stylization or required training of generative models we perform optimization on mesh parameters directly to generate shape, texture or both. To constrain the optimization to produce plausible meshes and textures we introduce a number of techniques using image augmentations and the use of a pretrained prior that generates CLIP image embeddings given a text embedding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training