ScribbleSense: Generative Scribble-Based Texture Editing with Intent Prediction
Yudi Zhang, Yeming Geng, Lei Zhang

TL;DR
ScribbleSense introduces a novel method combining multimodal large language models and image generation to improve texture editing in 3D models through intuitive scribble-based interactions, addressing ambiguity and semantic understanding.
Contribution
It is the first approach to effectively integrate MLLMs and image generation for semantic intent prediction in scribble-based 3D texture editing.
Findings
Achieves state-of-the-art performance in interactive scribble-based texture editing.
Successfully resolves ambiguity in scribble instructions using MLLMs.
Enhances local texture detail extraction through global image generation.
Abstract
Interactive 3D model texture editing presents enhanced opportunities for creating 3D assets, with freehand drawing style offering the most intuitive experience. However, existing methods primarily support sketch-based interactions for outlining, while the utilization of coarse-grained scribble-based interaction remains limited. Furthermore, current methodologies often encounter challenges due to the abstract nature of scribble instructions, which can result in ambiguous editing intentions and unclear target semantic locations. To address these issues, we propose ScribbleSense, an editing method that combines multimodal large language models (MLLMs) and image generation models to effectively resolve these challenges. We leverage the visual capabilities of MLLMs to predict the editing intent behind the scribbles. Once the semantic intent of the scribble is discerned, we employ globally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Interactive and Immersive Displays
