Harnessing the Conditioning Sensorium for Improved Image Translation
Cooper Nederhood, Nicholas Kolkin, Deqing Fu, Jason Salavon

TL;DR
This paper introduces Sensorium, a method for multi-modal image translation that leverages pre-trained models to define content, enabling more flexible and higher quality translations across diverse and complex scenes.
Contribution
It proposes a novel approach that uses off-the-shelf pre-trained models to define content, simplifying training and enhancing control over image translation.
Findings
Outperforms existing methods on traditional datasets like CelebA-HQ.
Effective on complex scenes in new datasets ClassicTV and FFHQ-Wild.
Provides intuitive control over content preservation during translation.
Abstract
Multi-modal domain translation typically refers to synthesizing a novel image that inherits certain localized attributes from a 'content' image (e.g. layout, semantics, or geometry), and inherits everything else (e.g. texture, lighting, sometimes even semantics) from a 'style' image. The dominant approach to this task is attempting to learn disentangled 'content' and 'style' representations from scratch. However, this is not only challenging, but ill-posed, as what users wish to preserve during translation varies depending on their goals. Motivated by this inherent ambiguity, we define 'content' based on conditioning information extracted by off-the-shelf pre-trained models. We then train our style extractor and image decoder with an easy to optimize set of reconstruction objectives. The wide variety of high-quality pre-trained models available and simple training procedure makes our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Vision and Imaging
