Consistent Multimodal Generation via A Unified GAN Framework
Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu,, Soeren Pirk, Derek Hoiem

TL;DR
This paper introduces a unified GAN framework based on StyleGAN3 for generating multiple correlated image modalities like RGB, depth, and normals, ensuring realism and consistency across outputs.
Contribution
It proposes a novel multimodal generation architecture with shared backbone, modality-specific branches, and specialized discriminators, enabling realistic and consistent multimodal image synthesis.
Findings
Achieves realistic and consistent multimodal image generation on Stanford2D3D dataset.
Provides a training recipe for domain extension with limited data.
Demonstrates usefulness of synthetic data for training depth estimators.
Abstract
We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality fidelity discriminators and a cross-modality consistency discriminator. In experiments on the Stanford2D3D dataset, we demonstrate realistic and consistent generation of RGB, depth, and normal images. We also show a training recipe to easily extend our pretrained model on a new domain, even with a few pairwise data. We further evaluate the use of synthetically generated RGB and depth pairs for training or fine-tuning depth estimators. Code will be available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Consistent Multimodal Generation via a Unified GAN Framework· youtube
Taxonomy
TopicsAdvanced Vision and Imaging · Video Analysis and Summarization · Image Processing Techniques and Applications
