Multi-View Image-to-Image Translation Supervised by 3D Pose
Idit Diamant, Oranit Dror, Hai Victor Habi, Arnon Netzer

TL;DR
This paper introduces an end-to-end multi-view image translation framework that uses 3D pose constraints to generate consistent, photo-realistic person images across multiple viewpoints with new poses.
Contribution
It proposes a novel joint learning approach for unpaired image translation models guided by 3D human pose constraints to ensure multi-view pose consistency.
Findings
Improved multi-view pose consistency in generated images
Enhanced photo-realism over baseline methods
Effective in generating images with new poses across views
Abstract
We address the task of multi-view image-to-image translation for person image generation. The goal is to synthesize photo-realistic multi-view images with pose-consistency across all views. Our proposed end-to-end framework is based on a joint learning of multiple unpaired image-to-image translation models, one per camera viewpoint. The joint learning is imposed by constraints on the shared 3D human pose in order to encourage the 2D pose projections in all views to be consistent. Experimental results on the CMU-Panoptic dataset demonstrate the effectiveness of the suggested framework in generating photo-realistic images of persons with new poses that are more consistent across all views in comparison to a standard Image-to-Image baseline. The code is available at: https://github.com/sony-si/MultiView-Img2Img
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Advanced Image Processing Techniques
