Scalable Autoregressive Monocular Depth Estimation
Jinhong Wang, Jian Liu, Dongqi Tang, Weiqiang Wang, Wentong Li, Danny, Chen, Jintai Chen, Jian Wu

TL;DR
This paper introduces a scalable autoregressive model for monocular depth estimation that achieves state-of-the-art results on multiple datasets by treating depth maps as tokens and employing coarse-to-fine autoregressive objectives.
Contribution
The paper presents a novel autoregressive depth estimation framework that scales to large models and demonstrates superior performance and generalization on benchmark datasets.
Findings
Achieves new SOTA on KITTI and NYU Depth v2 datasets.
Scales up to 2.0 billion parameters with improved RMSE.
Shows strong zero-shot generalization on unseen datasets.
Abstract
This paper shows that the autoregressive model is an effective and scalable monocular depth estimator. Our idea is simple: We tackle the monocular depth estimation (MDE) task with an autoregressive prediction paradigm, based on two core designs. First, our depth autoregressive model (DAR) treats the depth map of different resolutions as a set of tokens, and conducts the low-to-high resolution autoregressive objective with a patch-wise casual mask. Second, our DAR recursively discretizes the entire depth range into more compact intervals, and attains the coarse-to-fine granularity autoregressive objective in an ordinal-regression manner. By coupling these two autoregressive objectives, our DAR establishes new state-of-the-art (SOTA) on KITTI and NYU Depth v2 by clear margins. Further, our scalable approach allows us to scale the model up to 2.0B and achieve the best RMSE of 1.799 on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Image Processing Techniques and Applications
MethodsSparse Evolutionary Training
