DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer

Yuxuan Zhang; Katar\'ina T\'othov\'a; Zian Wang; Kangxue Yin; Haithem Turki; Riccardo de Lutio; Yen-Yu Chang; Or Litany; Sanja Fidler; Zan Gojcic

arXiv:2602.24096·cs.CV·March 6, 2026

DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer

Yuxuan Zhang, Katar\'ina T\'othov\'a, Zian Wang, Kangxue Yin, Haithem Turki, Riccardo de Lutio, Yen-Yu Chang, Or Litany, Sanja Fidler, Zan Gojcic

PDF

Open Access

TL;DR

DiffusionHarmonizer is an online generative framework that enhances neural scene renderings, reducing artifacts and improving realism for autonomous robot simulation, using a diffusion model adapted for real-time, scalable applications.

Contribution

It introduces a novel online enhancement method that transforms neural reconstructions into more realistic, artifact-free scenes with temporal consistency, suitable for real-time simulation.

Findings

01

Significantly reduces rendering artifacts in neural scene reconstructions.

02

Improves realism and temporal consistency of simulated scenes.

03

Operates efficiently on a single GPU in online environments.

Abstract

Simulation is essential to the development and evaluation of autonomous robots such as self-driving vehicles. Neural reconstruction is emerging as a promising solution as it enables simulating a wide variety of scenarios from real-world data alone in an automated and scalable way. However, while methods such as NeRF and 3D Gaussian Splatting can produce visually compelling results, they often exhibit artifacts particularly when rendering novel views, and fail to realistically integrate inserted dynamic objects, especially when they were captured from different scenes. To overcome these limitations, we introduce DiffusionHarmonizer, an online generative enhancement framework that transforms renderings from such imperfect scenes into temporally consistent outputs while improving their realism. At its core is a single-step temporally-conditioned enhancer that is converted from a pretrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis