TL;DR
This paper demonstrates that simple linear encoder-decoder architectures can perform unsupervised image-to-image translation effectively, often matching or surpassing complex deep models in speed and success on local and nonlocal tasks.
Contribution
The paper introduces linear architectures for unsupervised image translation, showing they are easier to train and can outperform deep models on certain tasks.
Findings
Linear models are faster to train and achieve comparable results on local problems.
Linear models succeed on nonlocal transformations where deep models fail.
Removing locality bias reveals the surprising effectiveness of simple linear methods.
Abstract
Unsupervised image-to-image translation is an inherently ill-posed problem. Recent methods based on deep encoder-decoder architectures have shown impressive results, but we show that they only succeed due to a strong locality bias, and they fail to learn very simple nonlocal transformations (e.g. mapping upside down faces to upright faces). When the locality bias is removed, the methods are too powerful and may fail to learn simple local transformations. In this paper we introduce linear encoder-decoder architectures for unsupervised image to image translation. We show that learning is much easier and faster with these architectures and yet the results are surprisingly effective. In particular, we show a number of local problems for which the results of the linear methods are comparable to those of state-of-the-art architectures but with a fraction of the training time, and a number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
