TL;DR
This paper introduces two novel cGAN architectures, X-Fork and X-Seq, for cross-view image synthesis, effectively generating natural scenes and semantic maps across aerial and street views with improved semantic consistency.
Contribution
The paper proposes two new architectures, X-Fork and X-Seq, specifically designed for cross-view image synthesis, enhancing semantic preservation and image quality over traditional methods.
Findings
Both architectures generate high-resolution images with accurate semantics.
X-Seq produces sharper images by using feedback from semantic segmentation.
The methods outperform state-of-the-art approaches in qualitative and quantitative evaluations.
Abstract
Learning to generate natural scenes has always been a challenging task in computer vision. It is even more painstaking when the generation is conditioned on images with drastically different views. This is mainly because understanding, corresponding, and transforming appearance and semantic information across the views is not trivial. In this paper, we attempt to solve the novel problem of cross-view image synthesis, aerial to street-view and vice versa, using conditional generative adversarial networks (cGAN). Two new architectures called Crossview Fork (X-Fork) and Crossview Sequential (X-Seq) are proposed to generate scenes with resolutions of 64x64 and 256x256 pixels. X-Fork architecture has a single discriminator and a single generator. The generator hallucinates both the image and its semantic segmentation in the target view. X-Seq architecture utilizes two cGANs. The first one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
