Scene-Agnostic Traversability Labeling and Estimation via a Multimodal Self-supervised Framework

Zipeng Fang; Yanbo Wang; Lei Zhao; Weidong Chen

arXiv:2508.18249·cs.RO·August 26, 2025

Scene-Agnostic Traversability Labeling and Estimation via a Multimodal Self-supervised Framework

Zipeng Fang, Yanbo Wang, Lei Zhao, Weidong Chen

PDF

TL;DR

This paper introduces a multimodal self-supervised framework that combines footprint, LiDAR, and camera data to improve traversability estimation for robots across various environments, achieving high accuracy and robustness.

Contribution

It presents a novel multimodal self-supervised approach with an annotation pipeline and dual-stream network, enhancing traversability recognition beyond prior single-modality methods.

Findings

01

Achieves around 88% IoU in diverse environments.

02

Outperforms existing self-supervised methods by 1.6-3.5% IoU.

03

Effective across urban, off-road, and campus settings.

Abstract

Traversability estimation is critical for enabling robots to navigate across diverse terrains and environments. While recent self-supervised learning methods achieve promising results, they often fail to capture the characteristics of non-traversable regions. Moreover, most prior works concentrate on a single modality, overlooking the complementary strengths offered by integrating heterogeneous sensory modalities for more robust traversability estimation. To address these limitations, we propose a multimodal self-supervised framework for traversability labeling and estimation. First, our annotation pipeline integrates footprint, LiDAR, and camera data as prompts for a vision foundation model, generating traversability labels that account for both semantic and geometric cues. Then, leveraging these labels, we train a dual-stream network that jointly learns from different modalities in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.