Efficient Hybrid CNN-GNN Architecture for Monocular Depth Estimation

Ishan Narayan

arXiv:2605.10251·cs.CV·May 12, 2026

Efficient Hybrid CNN-GNN Architecture for Monocular Depth Estimation

Ishan Narayan

PDF

TL;DR

GraphDepth is a novel hybrid CNN-GNN model for monocular depth estimation that models long-range spatial relationships efficiently, achieving competitive accuracy with lower computational costs.

Contribution

It introduces a scalable GNN integration within a CNN framework, with multi-scale GraphSAGE layers and uncertainty estimation for improved depth prediction.

Findings

01

Achieves 4.6% accuracy within state-of-the-art transformers on indoor benchmarks.

02

Runs at 25 FPS with significantly lower VRAM usage compared to transformer-based methods.

03

Outperforms previous methods on WHU Aerial and generalizes well cross-domain.

Abstract

We present GraphDepth, a monocular depth estimation architecture that synergistically integrates Graph Neural Networks (GNNs) within a convolutional encoder-decoder framework. Our approach embeds efficient GraphSAGE layers at multiple scales of a ResNet-101 U-Net backbone, enabling explicit modeling of long-range spatial relationships that lie beyond the receptive field of local convolutions. Key technical contributions include: (1) batch-parallelized graph construction with configurable k-NN and grid-based adjacency for scalable training; (2) multi-scale GraphSAGE integration at bottleneck and decoder stages (1/32, 1/16, 1/8 resolution) to propagate global context throughout the feature hierarchy; (3) channel-attention gated skip connections that adaptively weight encoder features before fusion; and (4) heteroscedastic uncertainty estimation via a dedicated aleatoric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.