Self-Attention Dense Depth Estimation Network for Unrectified Video Sequences
Alwyn Mathew, Aditya Prakash Patra, Jimson Mathew

TL;DR
This paper introduces a self-attention based deep learning network for dense depth estimation from unrectified video sequences, incorporating camera distortion into training, and achieving competitive results without relying on rectified images.
Contribution
The paper presents a novel self-attention depth and ego-motion network specifically designed for unrectified images, including camera distortion in the training process.
Findings
Performs competitively with methods using rectified images
Effective depth estimation from unrectified video sequences
Incorporates camera distortion into training pipeline
Abstract
The dense depth estimation of a 3D scene has numerous applications, mainly in robotics and surveillance. LiDAR and radar sensors are the hardware solution for real-time depth estimation, but these sensors produce sparse depth maps and are sometimes unreliable. In recent years research aimed at tackling depth estimation using single 2D image has received a lot of attention. The deep learning based self-supervised depth estimation methods from the rectified stereo and monocular video frames have shown promising results. We propose a self-attention based depth and ego-motion network for unrectified images. We also introduce non-differentiable distortion of the camera into the training pipeline. Our approach performs competitively when compared to other established approaches that used rectified images for depth estimation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
