TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And   Image-Prompts

Jingyu Zhuang; Di Kang; Yan-Pei Cao; Guanbin Li; Liang Lin; Ying Shan

arXiv:2401.14828·cs.CV·April 26, 2024·2 cites

TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

Jingyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan

PDF

Open Access

TL;DR

TIP-Editor is a novel 3D scene editing framework that integrates text and image prompts with bounding boxes, enabling precise control over appearance and location of edits, outperforming existing methods in accuracy and quality.

Contribution

The paper introduces TIP-Editor, a new 3D editing approach that combines text and image prompts with bounding boxes and uses 3D Gaussian splatting for precise local edits.

Findings

01

Achieves accurate 3D editing aligned with prompts and bounding boxes

02

Outperforms baselines in editing quality and prompt alignment

03

Utilizes a stepwise 2D personalization strategy and localization loss

Abstract

Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region. With the image prompt, users can conveniently specify the detailed appearance/style of the target content in complement to the text description, enabling accurate control of the appearance. Specifically, TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging