Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction

Wuqi Su; Huilun Song; Chen Zhao; Chi Xu

arXiv:2604.03339·cs.CV·April 7, 2026

Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction

Wuqi Su, Huilun Song, Chen Zhao, Chi Xu

PDF

TL;DR

This paper introduces a novel hierarchical model with hybrid pyramid feature fusion and a CRF decoder for improved monocular depth estimation, achieving state-of-the-art results efficiently.

Contribution

It proposes a multilevel perceptual CRF model with hybrid feature fusion and a hierarchical awareness adapter, enhancing depth prediction accuracy and computational efficiency.

Findings

01

Achieves state-of-the-art performance on NYU Depth v2 and KITTI datasets.

02

Reduces Abs Rel to 0.088 on NYU Depth v2.

03

Attains near-perfect threshold accuracy on KITTI with 194M parameters.

Abstract

Monocular depth estimation from a single RGB image remains a fundamental challenge in computer vision due to inherent scale ambiguity and the absence of explicit geometric cues. Existing approaches typically rely on increasingly complex network architectures to regress depth maps, which escalates training costs and computational overhead without fully exploiting inter-pixel spatial dependencies. We propose a multilevel perceptual conditional random field (CRF) model built upon the Swin Transformer backbone that addresses these limitations through three synergistic innovations: (1) an adaptive hybrid pyramid feature fusion (HPF) strategy that captures both short-range and long-range dependencies by combining multi-scale spatial pyramid pooling with biaxial feature aggregation, enabling effective integration of global and local contextual information; (2) a hierarchical awareness adapter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.