Adaptive Depth-converted-Scale Convolution for Self-supervised Monocular Depth Estimation

Yanbo Gao; Huibin Bai; Huasong Zhou; Xingyu Gao; Shuai Li; Xun Cai; Hui Yuan; Wei Hua; Tian Xie

arXiv:2604.07665·cs.CV·April 10, 2026

Adaptive Depth-converted-Scale Convolution for Self-supervised Monocular Depth Estimation

Yanbo Gao, Huibin Bai, Huasong Zhou, Xingyu Gao, Shuai Li, Xun Cai, Hui Yuan, Wei Hua, Tian Xie

PDF

TL;DR

This paper introduces an adaptive convolution method that explicitly models object size changes due to depth variations to improve monocular depth estimation, achieving state-of-the-art results on KITTI.

Contribution

The paper proposes Depth-converted-Scale Convolution (DcSConv) and a fusion module, enhancing existing CNNs for better depth estimation by handling object size-depth relationships.

Findings

01

Achieves up to 11.6% SqRel reduction on KITTI benchmark.

02

DcSConv improves feature extraction by focusing on adaptive scale.

03

The method is compatible as a plug-and-play module with existing CNNs.

Abstract

Self-supervised monocular depth estimation (MDE) has received increasing interests in the last few years. The objects in the scene, including the object size and relationship among different objects, are the main clues to extract the scene structure. However, previous works lack the explicit handling of the changing sizes of the object due to the change of its depth. Especially in a monocular video, the size of the same object is continuously changed, resulting in size and depth ambiguity. To address this problem, we propose a Depth-converted-Scale Convolution (DcSConv) enhanced monocular depth estimation framework, by incorporating the prior relationship between the object depth and object scale to extract features from appropriate scales of the convolution receptive field. The proposed DcSConv focuses on the adaptive scale of the convolution filter instead of the local deformation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.