Harnessing the Conditioning Sensorium for Improved Image Translation

Cooper Nederhood; Nicholas Kolkin; Deqing Fu; Jason Salavon

arXiv:2110.06443·cs.CV·October 14, 2021

Harnessing the Conditioning Sensorium for Improved Image Translation

Cooper Nederhood, Nicholas Kolkin, Deqing Fu, Jason Salavon

PDF

Open Access

TL;DR

This paper introduces Sensorium, a method for multi-modal image translation that leverages pre-trained models to define content, enabling more flexible and higher quality translations across diverse and complex scenes.

Contribution

It proposes a novel approach that uses off-the-shelf pre-trained models to define content, simplifying training and enhancing control over image translation.

Findings

01

Outperforms existing methods on traditional datasets like CelebA-HQ.

02

Effective on complex scenes in new datasets ClassicTV and FFHQ-Wild.

03

Provides intuitive control over content preservation during translation.

Abstract

Multi-modal domain translation typically refers to synthesizing a novel image that inherits certain localized attributes from a 'content' image (e.g. layout, semantics, or geometry), and inherits everything else (e.g. texture, lighting, sometimes even semantics) from a 'style' image. The dominant approach to this task is attempting to learn disentangled 'content' and 'style' representations from scratch. However, this is not only challenging, but ill-posed, as what users wish to preserve during translation varies depending on their goals. Motivated by this inherent ambiguity, we define 'content' based on conditioning information extracted by off-the-shelf pre-trained models. We then train our style extractor and image decoder with an easy to optimize set of reconstruction objectives. The wide variety of high-quality pre-trained models available and simple training procedure makes our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Vision and Imaging