Last-Layer-Centric Feature Recombination: Unleashing 3D Geometric Knowledge in DINOv3 for Monocular Depth Estimation

Gongshu Wang; Zhirui Wang; Kan Yang

arXiv:2604.26454·cs.CV·April 30, 2026

Last-Layer-Centric Feature Recombination: Unleashing 3D Geometric Knowledge in DINOv3 for Monocular Depth Estimation

Gongshu Wang, Zhirui Wang, Kan Yang

PDF

TL;DR

This paper analyzes DINOv3's layers for monocular depth estimation, revealing non-uniform distribution of 3D info, and introduces a Last-Layer-Centric Feature Recombination module that improves accuracy.

Contribution

It uncovers the non-uniform distribution of 3D geometric knowledge in DINOv3 layers and proposes a novel feature recombination method to enhance depth estimation performance.

Findings

01

Deeper layers in DINOv3 have stronger depth predictability.

02

The proposed LFR module improves monocular depth estimation accuracy.

03

LFR achieves state-of-the-art performance on benchmark datasets.

Abstract

Monocular depth estimation (MDE) is a fundamental yet inherently ill-posed task. Recent vision foundation models (VFMs), particularly DINO-based transformers, have significantly improved accuracy and generalization for dense prediction. Prior works generally follow a unified paradigm: sampling a fixed set of intermediate transformer layers at uniform intervals to build multi-scale features. This common practice implicitly assumes that geometric information is uniformly distributed across layers, which may underutilize the structural 3D cues encoded in VFMs. In this study, we present a systematic layer-wise analysis of DINOv3, revealing that 3D information is distributed non-uniformly: deeper layers exhibit stronger depth predictability and better capture inter-sample geometric variation. Motivated by this, we introduce a Last-Layer-Centric Feature Recombination (LFR) module to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.