TL;DR
VDPP is a fast, scalable, RGB-free post-processing framework that refines video depth estimates in real-time, enabling practical deployment on edge devices without retraining existing models.
Contribution
It introduces VDPP, a geometric refinement-based post-processing method that significantly improves speed and scalability of video depth estimation without sacrificing accuracy.
Findings
VDPP achieves over 43.5 FPS on NVIDIA Jetson Orin Nano.
It matches the temporal coherence of end-to-end models.
VDPP is RGB-free and easily integrates with existing depth models.
Abstract
Video depth estimation is essential for providing 3D scene structure in applications ranging from autonomous driving to mixed reality. Current end-to-end video depth models have established state-of-the-art performance. Although current end-to-end (E2E) models have achieved state-of-the-art performance, they function as tightly coupled systems that suffer from a significant adaptation lag whenever superior single-image depth estimators are released. To mitigate this issue, post-processing methods such as NVDS offer a modular plug-and-play alternative to incorporate any evolving image depth model without retraining. However, existing post-processing methods still struggle to match the efficiency and practicality of E2E systems due to limited speed, accuracy, and RGB reliance. In this work, we revitalize the role of post-processing by proposing VDPP (Video Depth Post-Processing), a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
