TL;DR
The paper introduces DGAD, a diffusion-based model that enables geometry-editable and appearance-preserving object composition by combining semantic embeddings with a cross-attention retrieval mechanism.
Contribution
DGAD is the first model to explicitly disentangle geometry editing and appearance preservation in object composition using diffusion models and cross-attention.
Findings
DGAD achieves superior geometric editing and appearance preservation on benchmarks.
The cross-attention retrieval mechanism effectively aligns fine-grained appearance features.
Experiments demonstrate DGAD's ability to produce realistic, geometry-adjusted composite images.
Abstract
General object composition (GOC) aims to seamlessly integrate a target object into a background scene with desired geometric properties, while simultaneously preserving its fine-grained appearance details. Recent approaches derive semantic embeddings and integrate them into advanced diffusion models to enable geometry-editable generation. However, these highly compact embeddings encode only high-level semantic cues and inevitably discard fine-grained appearance details. We introduce a Disentangled Geometry-editable and Appearance-preserving Diffusion (DGAD) model that first leverages semantic embeddings to implicitly capture the desired geometric transformations and then employs a cross-attention retrieval mechanism to align fine-grained appearance features with the geometry-edited representation, facilitating both precise geometry editing and faithful appearance preservation in object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Robotics and Sensor-Based Localization
