CLIPtortionist: Zero-shot Text-driven Deformation for Manufactured 3D Shapes
Xianghao Xu, Srinath Sridhar, Daniel Ritchie

TL;DR
This paper introduces CLIPtortionist, a zero-shot system for deforming 3D meshes of manufactured objects based on text descriptions, using a novel deformation model and global optimization to achieve realistic results.
Contribution
It presents a new deformation model called BoxDefGraph and a zero-shot optimization framework using CLIP and CMA-ES for text-driven 3D shape deformation.
Findings
Outperforms baseline methods in producing realistic deformations
Uses BoxDefGraph to effectively capture object geometry features
Global optimization with CMA-ES improves results over gradient-based methods
Abstract
We propose a zero-shot text-driven 3D shape deformation system that deforms an input 3D mesh of a manufactured object to fit an input text description. To do this, our system optimizes the parameters of a deformation model to maximize an objective function based on the widely used pre-trained vision language model CLIP. We find that CLIP-based objective functions exhibit many spurious local optima; to circumvent them, we parameterize deformations using a novel deformation model called BoxDefGraph which our system automatically computes from an input mesh, the BoxDefGraph is designed to capture the object aligned rectangular/circular geometry features of most manufactured objects. We then use the CMA-ES global optimization algorithm to maximize our objective, which we find to work better than popular gradient-based optimizers. We demonstrate that our approach produces appealing results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovations in Concrete and Construction Materials · Additive Manufacturing and 3D Printing Technologies · Modular Robots and Swarm Intelligence
MethodsContrastive Language-Image Pre-training
