Handle-based Mesh Deformation Guided By Vision Language Model

Xingpeng Sun; Shiyang Jia; Zherong Pan; Kui Wu; Aniket Bera

arXiv:2506.04562·cs.GR·August 22, 2025

Handle-based Mesh Deformation Guided By Vision Language Model

Xingpeng Sun, Shiyang Jia, Zherong Pan, Kui Wu, Aniket Bera

PDF

Open Access

TL;DR

This paper presents a training-free, handle-based mesh deformation method guided by a Vision-Language Model, which interprets user instructions and produces high-quality deformations with minimal manual tuning.

Contribution

It introduces a novel, training-free approach that leverages a Vision-Language Model and multi-view voting to improve mesh deformation quality and automation.

Findings

01

Produces deformations closely aligned with user intent

02

Achieves higher CLIP and GPTEval3D scores than prior methods

03

Maintains low distortion as measured by membrane energy

Abstract

Mesh deformation is a fundamental tool in 3D content manipulation. Despite extensive prior research, existing approaches often suffer from low output quality, require significant manual tuning, or depend on data-intensive training. To address these limitations, we introduce a training-free, handle-based mesh deformation method. % Our core idea is to leverage a Vision-Language Model (VLM) to interpret and manipulate a handle-based interface through prompt engineering. We begin by applying cone singularity detection to identify a sparse set of potential handles. The VLM is then prompted to select both the deformable sub-parts of the mesh and the handles that best align with user instructions. Subsequently, we query the desired deformed positions of the selected handles in screen space. To reduce uncertainty inherent in VLM predictions, we aggregate the results from multiple camera views…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInteractive and Immersive Displays · 3D Shape Modeling and Analysis · Robot Manipulation and Learning

MethodsContrastive Language-Image Pre-training · Sparse Evolutionary Training · ALIGN