TL;DR
BlenderFusion is a novel framework that enables 3D-grounded visual editing and scene compositing by integrating segmentation, editing, and a generative diffusion-based compositor, allowing flexible and high-quality scene modifications.
Contribution
It introduces a layered editing pipeline combined with a diffusion-based generative compositor fine-tuned for scene editing, which is a new approach in 3D scene compositing.
Findings
Outperforms prior methods in complex scene editing tasks
Enables flexible background replacement and object manipulation
Provides disentangled control over objects and camera movements
Abstract
We present BlenderFusion, a generative visual compositing framework that synthesizes new scenes by recomposing objects, camera, and background. It follows a layering-editing-compositing pipeline: (i) segmenting and converting visual inputs into editable 3D entities (layering), (ii) editing them in Blender with 3D-grounded control (editing), and (iii) fusing them into a coherent scene using a generative compositor (compositing). Our generative compositor extends a pre-trained diffusion model to process both the original (source) and edited (target) scenes in parallel. It is fine-tuned on video frames with two key training strategies: (i) source masking, enabling flexible modifications like background replacement; (ii) simulated object jittering, facilitating disentangled control over objects and camera. BlenderFusion significantly outperforms prior methods in complex compositional scene…
Peer Reviews
Decision·Submitted to ICLR 2026
Clear, production-like workflow; easy to implement. Results suggest better local control on some inserts.
**Key Baselines Omitted:** - ZeroComp[1]: composites intrinsic layers (depth/normal/albedo/shading) and lets diffusion render the final image. Similar goal but without using Blender directly. But they use a rendering engine to give approx 3D compositing. - DiffusionRenderer [2]: turns G-buffers into photoreal images/videos; direct alternative to “Blender render to diffusion fix.” - 2D diffusion compositors: ObjectStitch [3], Thinking Outside the BBox [4], ControlCom [5], IMPRINT [6]: Generativ
1. Clear Motivation and Strong Problem Formulation: The paper clearly identifies a significant and practical limitation in current generative AI: the lack of precise, 3D-aware control for complex, multi-object scene compositing. It effectively positions its contribution against existing methods (Table 1), clearly highlighting the gap it aims to fill. 2. Novel and Elegant Framework Design: The primary strength of this work lies in its core idea of decoupling 3D control from generative synthesis.
1. Insufficient Detail on the Core Technical Novelty (Sec. 3.2): The paper's primary methodological contribution, the "Dual-stream Diffusion Compositor" in Section 3.2, is not described with sufficient clarity. The architecture is presented as a high-level black box, and the paper fails to provide a detailed diagram or explanation of the crucial "cross-stream interaction" mechanism. It is strongly recommended that the authors add a dedicated figure and more detailed text to fully articulate this
• The paper is well-written with a logical structure that makes the technical contributions easy to follow. • The proposed framework is reasonable and well-justified. The experimental results convincingly demonstrate the effectiveness of the approach across various compositing scenarios. • Excellent supplementary materials: The demo videos and project page significantly aid in understanding the core concepts and practical applications of the method.
• Recent works have explored 3D scene reconstruction and composition capabilities. A more thorough comparison and discussion of the relationship between BlenderFusion and these methods would strengthen the paper. For example: • CAST [1] performs component-aligned 3D scene reconstruction from a single RGB image. How does BlenderFusion's layering approach compare to CAST's decomposition strategy? • What are the trade-offs between the generative compositing approach and traditional 3D reconstructio
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion · Softmax · RoIAlign
