NAIMA: Semantics Aware RGB Guided Depth Super-Resolution

Tayyab Nasir; Daochang Liu; Ajmal Mian

arXiv:2604.04407·eess.IV·April 7, 2026

NAIMA: Semantics Aware RGB Guided Depth Super-Resolution

Tayyab Nasir, Daochang Liu, Ajmal Mian

PDF

TL;DR

NAIMA introduces a semantics-aware RGB-guided depth super-resolution method that leverages pretrained vision transformer embeddings and a novel attention module to improve depth map quality.

Contribution

It proposes a new architecture combining DINOv2 and GTA modules to effectively incorporate semantic priors into depth super-resolution.

Findings

01

Achieves significant improvements over existing methods.

02

Effectively distills semantic knowledge from pretrained transformers.

03

Improves depth boundary sharpness and structural detail.

Abstract

Guided depth super-resolution (GDSR) is a multi-modal approach for depth map super-resolution that relies on a low-resolution depth map and a high-resolution RGB image to restore finer structural details. However, the misleading color and texture cues indicating depth discontinuities in RGB images often lead to artifacts and blurred depth boundaries in the generated depth map. We propose a solution that introduces global contextual semantic priors, generated from pretrained vision transformer token embeddings. Our approach to distilling semantic knowledge from pretrained token embeddings is motivated by their demonstrated effectiveness in related monocular depth estimation tasks. We introduce a Guided Token Attention (GTA) module, which iteratively aligns encoded RGB spatial features with depth encodings, using cross-attention for selectively injecting global semantic context extracted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.