Self-supervised Monocular Depth Estimation with Large Kernel Attention

Xuezhi Xiang; Yao Wang; Lei Zhang; Denis Ombati; Himaloy Himu,; Xiantong Zhen

arXiv:2409.17895·cs.CV·September 27, 2024

Self-supervised Monocular Depth Estimation with Large Kernel Attention

Xuezhi Xiang, Yao Wang, Lei Zhang, Denis Ombati, Himaloy Himu,, Xiantong Zhen

PDF

Open Access

TL;DR

This paper introduces a self-supervised monocular depth estimation method using large kernel attention to better model spatial and channel features, resulting in finer depth maps without losing 2D structure.

Contribution

It proposes a novel decoder with large kernel attention and an up-sampling module to improve depth detail accuracy while preserving spatial information.

Findings

01

Achieves competitive results on the KITTI dataset

02

Models long-distance dependencies without losing 2D structure

03

Enhances fine detail recovery in depth maps

Abstract

Self-supervised monocular depth estimation has emerged as a promising approach since it does not rely on labeled training data. Most methods combine convolution and Transformer to model long-distance dependencies to estimate depth accurately. However, Transformer treats 2D image features as 1D sequences, and positional encoding somewhat mitigates the loss of spatial information between different feature blocks, tending to overlook channel features, which limit the performance of depth estimation. In this paper, we propose a self-supervised monocular depth estimation network to get finer details. Specifically, we propose a decoder based on large kernel attention, which can model long-distance dependencies without compromising the two-dimension structure of features while maintaining feature channel adaptivity. In addition, we introduce a up-sampling module to accurately recover the fine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections