3D Object Manipulation in a Single Image using Generative Models
Ruisi Zhao, Zechuan Zhang, Zongxin Yang, Yi Yang

TL;DR
OMG3D is a novel framework that combines geometric control with diffusion models to enable realistic 3D object manipulation and motion editing in images, improving visual fidelity in static and dynamic scenarios.
Contribution
The paper introduces OMG3D, integrating 3D conversion, texture refinement, and lighting correction for improved image manipulation using diffusion models.
Findings
Enhanced visual realism in object editing and motion
Effective texture and lighting refinement modules
Operable with a single NVIDIA 3090 GPU
Abstract
Object manipulation in images aims to not only edit the object's presentation but also gift objects with motion. Previous methods encountered challenges in concurrently handling static editing and dynamic generation, while also struggling to achieve fidelity in object appearance and scene lighting. In this work, we introduce \textbf{OMG3D}, a novel framework that integrates the precise geometric control with the generative power of diffusion models, thus achieving significant enhancements in visual performance. Our framework first converts 2D objects into 3D, enabling user-directed modifications and lifelike motions at the geometric level. To address texture realism, we propose CustomRefiner, a texture refinement module that pre-train a customized diffusion model, aligning the details and style of coarse renderings of 3D rough model with the original image, further refine the texture.…
Peer Reviews
Decision·Submitted to ICLR 2025
- The proposed method utilizes explicit 3D generation capability to ensure both static and dynamic manipulation. While other 2D-based methods fail to do so. - The utilization of HDRi for realistic lighting. - Visual quality is good.
- Lack of technical novelty, most of the presented techniques exist. The proposed method seems to put them together nicely to produce a few good results. E.g., CustomRefiner is a combination of depth-controlnet, dreambooth lora, and differentiable rendering on UV map, IllumiCombiner is a combination of HDRi estimation and virtual-plane rendering. - The idea of realistic lighting is interesting to me, but I am not convinced by the proposed method. The real light transport is more complex than by
They can combine precise geometric control. They can handle better texture renderings. They can handle lighting better. They offer complete comparison to showcase their results with other state-of-the-art methods.
They can try comparison with VSD loss. Or provide more visual examples in the supplementary results.
+ Compared with Image Sculpting, OMG3D can produce results showing better texture alignment with the original image and achieve realistic light and shadow effects. + The idea of gradient backpropagation to the UV texture map through differentiable rasterization sounds novel. + The idea of estimating a spherical light from the background image and applying it in the rendering pipeline to achieve realistic shading and shadow sounds logical. + The qualitative results look convincing.
- A large part of the proposed pipeline comes from Image Sculpting. For instance, the "precise geometric control" is made possible by object segmentation followed by image-to-3D. The generative enhancement used in driving the UV-texture optimization also appears to be identical to that in Image Scuplting. This makes this work a bit incremental and lowers its novelty. Overall, this work can be regarded as an integration of Image Scultping (for 3D model manuipulation) and DiffusionLight (for intro
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction · 3D Shape Modeling and Analysis
MethodsDiffusion
