Fractal Autoregressive Depth Estimation with Continuous Token Diffusion
Jinchang Zhang, Xinrou Kang, Guoyu Lu

TL;DR
This paper introduces a novel fractal autoregressive diffusion framework for monocular depth estimation, combining multi-scale feature fusion, continuous depth modeling, and a recursive architecture for improved accuracy and efficiency.
Contribution
It proposes a fractal recursive autoregressive diffusion model that enhances depth estimation by integrating multi-scale features and continuous depth modeling within a self-similar hierarchy.
Findings
Achieves strong performance on standard benchmarks.
Effectively models depth as a continuous distribution.
Improves computational efficiency through fractal architecture.
Abstract
Monocular depth estimation can benefit from autoregressive (AR) generation, but direct AR modeling is hindered by the modality gap between RGB and depth, inefficient pixel-wise generation, and instability in continuous depth prediction. We propose a Fractal Visual Autoregressive Diffusion framework that reformulates depth estimation as a coarse-to-fine, next-scale autoregressive generation process. A VCFR module fuses multi-scale image features with current depth predictions to improve cross-modal conditioning, while a conditional denoising diffusion loss models depth distributions directly in continuous space and mitigates errors caused by discrete quantization. To improve computational efficiency, we organize the scale-wise generators into a fractal recursive architecture, reusing a base visual AR unit in a self-similar hierarchy. We further introduce an uncertainty-aware robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Image Processing Techniques and Applications
