Scene-Agnostic Traversability Labeling and Estimation via a Multimodal Self-supervised Framework
Zipeng Fang, Yanbo Wang, Lei Zhao, Weidong Chen

TL;DR
This paper introduces a multimodal self-supervised framework that combines footprint, LiDAR, and camera data to improve traversability estimation for robots across various environments, achieving high accuracy and robustness.
Contribution
It presents a novel multimodal self-supervised approach with an annotation pipeline and dual-stream network, enhancing traversability recognition beyond prior single-modality methods.
Findings
Achieves around 88% IoU in diverse environments.
Outperforms existing self-supervised methods by 1.6-3.5% IoU.
Effective across urban, off-road, and campus settings.
Abstract
Traversability estimation is critical for enabling robots to navigate across diverse terrains and environments. While recent self-supervised learning methods achieve promising results, they often fail to capture the characteristics of non-traversable regions. Moreover, most prior works concentrate on a single modality, overlooking the complementary strengths offered by integrating heterogeneous sensory modalities for more robust traversability estimation. To address these limitations, we propose a multimodal self-supervised framework for traversability labeling and estimation. First, our annotation pipeline integrates footprint, LiDAR, and camera data as prompts for a vision foundation model, generating traversability labels that account for both semantic and geometric cues. Then, leveraging these labels, we train a dual-stream network that jointly learns from different modalities in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
