DeFM: Learning Foundation Representations from Depth for Robotics

Manthan Patel; Jonas Frey; Mayank Mittal; Fan Yang; Alexander Hansson; Amir Bar; Cesar Cadena; Marco Hutter

arXiv:2601.18923·cs.RO·January 28, 2026

DeFM: Learning Foundation Representations from Depth for Robotics

Manthan Patel, Jonas Frey, Mayank Mittal, Fan Yang, Alexander Hansson, Amir Bar, Cesar Cadena, Marco Hutter

PDF

Open Access 1 Models

TL;DR

DeFM is a self-supervised foundation model trained on 60 million depth images that learns geometric and semantic representations, enabling robust robotic perception and manipulation across diverse environments without task-specific fine-tuning.

Contribution

This work introduces DeFM, the first large-scale self-supervised foundation model for depth images, with novel normalization and distillation techniques for robotic applications.

Findings

01

Achieves state-of-the-art results on multiple depth-based benchmarks

02

Demonstrates strong generalization from simulation to real-world environments

03

Provides pretrained models for off-the-shelf robotic depth perception

Abstract

Depth sensors are widely deployed across robotic platforms, and advances in fast, high-fidelity depth simulation have enabled robotic policies trained on depth observations to achieve robust sim-to-real transfer for a wide range of tasks. Despite this, representation learning for depth modality remains underexplored compared to RGB, where large-scale foundation models now define the state of the art. To address this gap, we present DeFM, a self-supervised foundation model trained entirely on depth images for robotic applications. Using a DINO-style self-distillation objective on a curated dataset of 60M depth images, DeFM learns geometric and semantic representations that generalize to diverse environments, tasks, and sensors. To retain metric awareness across multiple scales, we introduce a novel input normalization strategy. We further distill DeFM into compact models suitable for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
leggedrobotics/defm
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · Advanced Neural Network Applications