TL;DR
This paper introduces In2I, a GAN-based framework for unsupervised multi-image-to-image translation that leverages multiple inputs to improve translation quality and outperforms existing methods.
Contribution
It extends unsupervised image translation to multiple inputs with a novel multi-modal generator and latent consistency loss.
Findings
Multi-input translation improves visual quality.
The method outperforms current state-of-the-art techniques.
Leveraging multiple modalities enhances translation accuracy.
Abstract
In unsupervised image-to-image translation, the goal is to learn the mapping between an input image and an output image using a set of unpaired training images. In this paper, we propose an extension of the unsupervised image-to-image translation problem to multiple input setting. Given a set of paired images from multiple modalities, a transformation is learned to translate the input into a specified domain. For this purpose, we introduce a Generative Adversarial Network (GAN) based framework along with a multi-modal generator structure and a new loss term, latent consistency loss. Through various experiments we show that leveraging multiple inputs generally improves the visual quality of the translated images. Moreover, we show that the proposed method outperforms current state-of-the-art unsupervised image-to-image translation methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
