TL;DR
This paper introduces a depth estimation method from multi-view images that leverages previous scene encodings using a Gaussian process prior, enabling real-time performance on smart devices.
Contribution
It proposes a novel multi-view stereo model with a nonparametric Gaussian process prior for temporal fusion, trained end-to-end for improved depth estimation.
Findings
Effective depth estimation from multi-view images using temporal information.
Real-time performance achieved on smart devices.
Flexible Gaussian process prior adapts memory from previous views.
Abstract
We propose a novel idea for depth estimation from multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. This model uses pairs of images and poses, which are passed through an encoder--decoder model for disparity estimation. The novelty lies in soft-constraining the bottleneck layer by a nonparametric Gaussian process prior. We propose a pose-kernel structure that encourages similar poses to have resembling latent spaces. The flexibility of the Gaussian process (GP) prior provides adapting memory for fusing information from previous views. We train the encoder--decoder and the GP hyperparameters jointly end-to-end. In addition to a batch method, we derive a lightweight estimation scheme that circumvents standard pitfalls in scaling Gaussian process inference, and demonstrate how our scheme can run in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGaussian Process
