OBJECT 3DIT: Language-guided 3D-aware Image Editing
Oscar Michel, Anand Bhattad, Eli VanderBilt, Ranjay Krishna, Aniruddha, Kembhavi, Tanmay Gupta

TL;DR
This paper introduces OBJECT 3DIT, a novel approach for language-guided, 3D-aware image editing that considers scene geometry and lighting, supported by a large synthetic dataset and models that generalize well to real images.
Contribution
The paper presents a new dataset and models for 3D-aware image editing guided by language instructions, addressing the gap in existing tools that ignore scene geometry.
Findings
Models understand 3D scene composition including lighting and shadows.
Training on synthetic data generalizes to real-world images.
Impressive editing capabilities demonstrated on various tasks.
Abstract
Existing image editing tools, while powerful, typically disregard the underlying 3D geometry from which the image is projected. As a result, edits made using these tools may become detached from the geometry and lighting conditions that are at the foundation of the image formation process. In this work, we formulate the newt ask of language-guided 3D-aware editing, where objects in an image should be edited according to a language instruction in context of the underlying 3D scene. To promote progress towards this goal, we release OBJECT: a dataset consisting of 400K editing examples created from procedurally generated 3D scenes. Each example consists of an input image, editing instruction in language, and the edited image. We also introduce 3DIT : single and multi-task models for four editing tasks. Our models show impressive abilities to understand the 3D composition of entire scenes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Handwritten Text Recognition Techniques
