BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation
Zhenyu Li, Xuyang Wang, Xianming Liu, Junjun Jiang

TL;DR
BinsFormer introduces a novel Transformer-based framework for monocular depth estimation that adaptively generates bins and enhances spatial understanding, achieving state-of-the-art results on multiple datasets.
Contribution
The paper proposes a new method using Transformer decoders for adaptive bin generation and multi-scale decoding for improved depth estimation accuracy.
Findings
Outperforms existing methods on KITTI, NYU, and SUN RGB-D datasets.
Uses a novel set-to-set prediction approach for bin generation.
Incorporates scene understanding queries to enhance depth accuracy.
Abstract
Monocular depth estimation is a fundamental task in computer vision and has drawn increasing attention. Recently, some methods reformulate it as a classification-regression task to boost the model performance, where continuous depth is estimated via a linear combination of predicted probability distributions and discrete bins. In this paper, we present a novel framework called BinsFormer, tailored for the classification-regression-based depth estimation. It mainly focuses on two crucial components in the specific task: 1) proper generation of adaptive bins and 2) sufficient interaction between probability distribution and bins predictions. To specify, we employ the Transformer decoder to generate bins, novelly viewing it as a direct set-to-set prediction problem. We further integrate a multi-scale decoder structure to achieve a comprehensive understanding of spatial geometry information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image and Video Retrieval Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Softmax · Layer Normalization · Label Smoothing · Byte Pair Encoding · Position-Wise Feed-Forward Layer
