TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan

TL;DR
TIP-Editor is a novel 3D scene editing framework that integrates text and image prompts with bounding boxes, enabling precise control over appearance and location of edits, outperforming existing methods in accuracy and quality.
Contribution
The paper introduces TIP-Editor, a new 3D editing approach that combines text and image prompts with bounding boxes and uses 3D Gaussian splatting for precise local edits.
Findings
Achieves accurate 3D editing aligned with prompts and bounding boxes
Outperforms baselines in editing quality and prompt alignment
Utilizes a stepwise 2D personalization strategy and localization loss
Abstract
Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region. With the image prompt, users can conveniently specify the detailed appearance/style of the target content in complement to the text description, enabling accurate control of the appearance. Specifically, TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
