STATIC : Surface Temporal Affine for TIme Consistency in Video Monocular Depth Estimation
Sunghun Yang, Minhyeok Lee, Suhwan Cho, Jungho Lee, Sangyoun Lee

TL;DR
STATIC introduces a novel approach for video monocular depth estimation that independently models static and dynamic regions to improve temporal consistency without extra information, achieving state-of-the-art results.
Contribution
The paper presents a new model that learns temporal consistency separately for static and dynamic areas using surface normals and feature similarity, avoiding reliance on optical flow or camera data.
Findings
Achieves state-of-the-art results on KITTI and NYUv2 datasets.
Effectively models static and dynamic regions independently.
Improves temporal consistency without additional information.
Abstract
Video monocular depth estimation is essential for applications such as autonomous driving, AR/VR, and robotics. Recent transformer-based single-image monocular depth estimation models perform well on single images but struggle with depth consistency across video frames. Traditional methods aim to improve temporal consistency using multi-frame temporal modules or prior information like optical flow and camera parameters. However, these approaches face issues such as high memory use, reduced performance with dynamic or irregular motion, and limited motion understanding. We propose STATIC, a novel model that independently learns temporal consistency in static and dynamic area without additional information. A difference mask from surface normals identifies static and dynamic area by measuring directional variance. For static area, the Masked Static (MS) module enhances temporal consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Industrial Vision Systems and Defect Detection · Image Processing Techniques and Applications
