Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance   for Cross-View Image Translation

Hao Tang; Dan Xu; Nicu Sebe; Yanzhi Wang; Jason J. Corso; Yan Yan

arXiv:1904.06807·cs.CV·April 18, 2019·25 cites

Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan

PDF

Open Access 3 Repos

TL;DR

This paper introduces SelectionGAN, a two-stage generative model that leverages semantic maps and multi-channel attention to improve cross-view image translation, producing more accurate and realistic scene images from different viewpoints.

Contribution

The paper proposes a novel two-stage SelectionGAN with multi-channel attention and semantic guidance for enhanced cross-view image translation.

Findings

01

Outperforms state-of-the-art methods on multiple datasets

02

Generates more accurate and realistic images from different viewpoints

03

Utilizes uncertainty-guided pixel loss for better optimization

Abstract

Cross-view image translation is challenging because it involves images with drastically different views and severe deformation. In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map. The proposed SelectionGAN explicitly utilizes the semantic information and consists of two stages. In the first stage, the condition image and the target semantic map are fed into a cycled semantic-guided generation network to produce initial coarse results. In the second stage, we refine the initial results by using a multi-channel attention selection mechanism. Moreover, uncertainty maps automatically learned from attentions are used to guide the pixel loss for better network optimization. Extensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Multimodal Machine Learning Applications