AdaBins: Depth Estimation using Adaptive Bins

Shariq Farooq Bhat; Ibraheem Alhashim; Peter Wonka

arXiv:2011.14141·cs.CV·March 30, 2022

AdaBins: Depth Estimation using Adaptive Bins

Shariq Farooq Bhat, Ibraheem Alhashim, Peter Wonka

PDF

5 Repos

TL;DR

AdaBins introduces an adaptive binning transformer block for single-image depth estimation, significantly improving accuracy over previous methods by adaptively dividing depth ranges into bins per image.

Contribution

The paper proposes AdaBins, a novel transformer-based block that adaptively estimates depth bin centers, enhancing depth map quality from single RGB images.

Findings

01

State-of-the-art performance on multiple datasets

02

Significant accuracy improvements over previous methods

03

Effective ablation study validating the approach

Abstract

We address the problem of estimating a high quality dense depth map from a single RGB input image. We start out with a baseline encoder-decoder convolutional neural network architecture and pose the question of how the global processing of information can help improve overall depth estimation. To this end, we propose a transformer-based architecture block that divides the depth range into bins whose center value is estimated adaptively per image. The final depth values are estimated as linear combinations of the bin centers. We call our new building block AdaBins. Our results show a decisive improvement over the state-of-the-art on several popular depth datasets across all metrics. We also validate the effectiveness of the proposed block with an ablation study and provide the code and corresponding pre-trained weights of the new state-of-the-art model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAdaptive Bins · Linear Layer · Residual Connection · Layer Normalization · Softmax · Attention Is All You Need · Multi-Head Attention · EfficientNet · Vision Transformer