Depth self-supervision for single image novel view synthesis
Giovanni Minelli, Matteo Poggi, Samuele Salti

TL;DR
This paper introduces a self-supervised framework that jointly optimizes for novel view synthesis and depth estimation from a single image, leading to improved image quality and depth accuracy.
Contribution
It proposes a shared depth decoder trained in a self-supervised manner to enhance both view synthesis and depth estimation simultaneously.
Findings
Higher-quality generated images
More accurate depth maps for target views
Effective joint optimization of view synthesis and depth estimation
Abstract
In this paper, we tackle the problem of generating a novel image from an arbitrary viewpoint given a single frame as input. While existing methods operating in this setup aim at predicting the target view depth map to guide the synthesis, without explicit supervision over such a task, we jointly optimize our framework for both novel view synthesis and depth estimation to unleash the synergy between the two at its best. Specifically, a shared depth decoder is trained in a self-supervised manner to predict depth maps that are consistent across the source and target views. Our results demonstrate the effectiveness of our approach in addressing the challenges of both tasks allowing for higher-quality generated images, as well as more accurate depth for the target viewpoint.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Enhancement Techniques · Advanced Image Processing Techniques
