Attention-based Context Aggregation Network for Monocular Depth Estimation
Yuru Chen, Haitao Zhao, Zhengwei Hu

TL;DR
This paper introduces an attention-based network that adaptively models context for monocular depth estimation, reducing discretization errors and improving accuracy by combining image-level and pixel-level information.
Contribution
The paper proposes a novel ACAN model using self-attention for adaptive context aggregation and a soft ordinal inference for continuous depth prediction, advancing monocular depth estimation.
Findings
Achieves competitive results on NYU Depth V2 and KITTI datasets.
Reduces RMSE discretization error by about 1%.
Demonstrates the effectiveness of attention-based context modeling.
Abstract
Depth estimation is a traditional computer vision task, which plays a crucial role in understanding 3D scene geometry. Recently, deep-convolutional-neural-networks based methods have achieved promising results in the monocular depth estimation field. Specifically, the framework that combines the multi-scale features extracted by the dilated convolution based block (atrous spatial pyramid pooling, ASPP) has gained the significant improvement in the dense labeling task. However, the discretized and predefined dilation rates cannot capture the continuous context information that differs in diverse scenes and easily introduce the grid artifacts in depth estimation. In this paper, we propose an attention-based context aggregation network (ACAN) to tackle these difficulties. Based on the self-attention model, ACAN adaptively learns the task-specific similarities between pixels to model the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques
