TL;DR
This paper presents a hybrid method combining diffusion-based and image-to-image translation models to improve the photorealism of synthetic datasets generated by game engines, reducing the sim2real appearance gap.
Contribution
It introduces a novel hybrid approach that leverages the strengths of both diffusion models and image translation techniques for enhanced dataset realism.
Findings
REGEN outperforms FLUX.2-4B Klein in photorealism enhancement.
Combining both models yields better realism than individual models.
The hybrid approach maintains semantic consistency while improving visual quality.
Abstract
Video game engines have been an important source for generating large volumes of visual synthetic datasets for training and evaluating computer vision algorithms that are to be deployed in the real world. While the visual fidelity of modern game engines has been significantly improved with technologies such as ray-tracing, a notable sim2real appearance gap between the synthetic and the real-world images still remains, which limits the utilization of synthetic datasets in real-world applications. In this letter, we investigate the ability of a state-of-the-art image generation and editing diffusion model (FLUX.2-4B Klein) to enhance the photorealism of synthetic datasets and compare its performance against a traditional image-to-image translation model (REGEN). Furthermore, we propose a hybrid approach that combines the strong geometry and material transformations of diffusion-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
