Efficient Hybrid CNN-GNN Architecture for Monocular Depth Estimation
Ishan Narayan

TL;DR
GraphDepth is a novel hybrid CNN-GNN model for monocular depth estimation that models long-range spatial relationships efficiently, achieving competitive accuracy with lower computational costs.
Contribution
It introduces a scalable GNN integration within a CNN framework, with multi-scale GraphSAGE layers and uncertainty estimation for improved depth prediction.
Findings
Achieves 4.6% accuracy within state-of-the-art transformers on indoor benchmarks.
Runs at 25 FPS with significantly lower VRAM usage compared to transformer-based methods.
Outperforms previous methods on WHU Aerial and generalizes well cross-domain.
Abstract
We present GraphDepth, a monocular depth estimation architecture that synergistically integrates Graph Neural Networks (GNNs) within a convolutional encoder-decoder framework. Our approach embeds efficient GraphSAGE layers at multiple scales of a ResNet-101 U-Net backbone, enabling explicit modeling of long-range spatial relationships that lie beyond the receptive field of local convolutions. Key technical contributions include: (1) batch-parallelized graph construction with configurable k-NN and grid-based adjacency for scalable training; (2) multi-scale GraphSAGE integration at bottleneck and decoder stages (1/32, 1/16, 1/8 resolution) to propagate global context throughout the feature hierarchy; (3) channel-attention gated skip connections that adaptively weight encoder features before fusion; and (4) heteroscedastic uncertainty estimation via a dedicated aleatoric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
