Handle-based Mesh Deformation Guided By Vision Language Model
Xingpeng Sun, Shiyang Jia, Zherong Pan, Kui Wu, Aniket Bera

TL;DR
This paper presents a training-free, handle-based mesh deformation method guided by a Vision-Language Model, which interprets user instructions and produces high-quality deformations with minimal manual tuning.
Contribution
It introduces a novel, training-free approach that leverages a Vision-Language Model and multi-view voting to improve mesh deformation quality and automation.
Findings
Produces deformations closely aligned with user intent
Achieves higher CLIP and GPTEval3D scores than prior methods
Maintains low distortion as measured by membrane energy
Abstract
Mesh deformation is a fundamental tool in 3D content manipulation. Despite extensive prior research, existing approaches often suffer from low output quality, require significant manual tuning, or depend on data-intensive training. To address these limitations, we introduce a training-free, handle-based mesh deformation method. % Our core idea is to leverage a Vision-Language Model (VLM) to interpret and manipulate a handle-based interface through prompt engineering. We begin by applying cone singularity detection to identify a sparse set of potential handles. The VLM is then prompted to select both the deformable sub-parts of the mesh and the handles that best align with user instructions. Subsequently, we query the desired deformed positions of the selected handles in screen space. To reduce uncertainty inherent in VLM predictions, we aggregate the results from multiple camera views…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · 3D Shape Modeling and Analysis · Robot Manipulation and Learning
MethodsContrastive Language-Image Pre-training · Sparse Evolutionary Training · ALIGN
