Semantify: Simplifying the Control of 3D Morphable Models using CLIP
Omer Gralnik, Guy Gafni, Ariel Shamir

TL;DR
Semantify leverages CLIP's semantic understanding to create an intuitive, self-supervised method for controlling 3D morphable models through a simple slider interface, enabling easy shape manipulation and fitting to images.
Contribution
It introduces a novel self-supervised approach that uses CLIP to map semantic descriptors to 3D model parameters without human intervention.
Findings
Effective control of various 3D morphable models using semantic descriptors.
Enables instant fitting of 3D models to in-the-wild images.
Provides an intuitive slider-based interface for 3D shape editing.
Abstract
We present Semantify: a self-supervised method that utilizes the semantic power of CLIP language-vision foundation model to simplify the control of 3D morphable models. Given a parametric model, training data is created by randomly sampling the model's parameters, creating various shapes and rendering them. The similarity between the output images and a set of word descriptors is calculated in CLIP's latent space. Our key idea is first to choose a small set of semantically meaningful and disentangled descriptors that characterize the 3DMM, and then learn a non-linear mapping from scores across this set to the parametric coefficients of the given 3DMM. The non-linear mapping is defined by training a neural network without a human-in-the-loop. We present results on numerous 3DMMs: body shape models, face shape and expression models, as well as animal shapes. We demonstrate how our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Semantify: Simplifying the Control of 3D Morphable Models Using CLIP· youtube
Taxonomy
Topics3D Shape Modeling and Analysis · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis
MethodsContrastive Language-Image Pre-training
