Self-supervised Monocular Depth Estimation with Large Kernel Attention
Xuezhi Xiang, Yao Wang, Lei Zhang, Denis Ombati, Himaloy Himu,, Xiantong Zhen

TL;DR
This paper introduces a self-supervised monocular depth estimation method using large kernel attention to better model spatial and channel features, resulting in finer depth maps without losing 2D structure.
Contribution
It proposes a novel decoder with large kernel attention and an up-sampling module to improve depth detail accuracy while preserving spatial information.
Findings
Achieves competitive results on the KITTI dataset
Models long-distance dependencies without losing 2D structure
Enhances fine detail recovery in depth maps
Abstract
Self-supervised monocular depth estimation has emerged as a promising approach since it does not rely on labeled training data. Most methods combine convolution and Transformer to model long-distance dependencies to estimate depth accurately. However, Transformer treats 2D image features as 1D sequences, and positional encoding somewhat mitigates the loss of spatial information between different feature blocks, tending to overlook channel features, which limit the performance of depth estimation. In this paper, we propose a self-supervised monocular depth estimation network to get finer details. Specifically, we propose a decoder based on large kernel attention, which can model long-distance dependencies without compromising the two-dimension structure of features while maintaining feature channel adaptivity. In addition, we introduce a up-sampling module to accurately recover the fine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections
