Watch Your STEPP: Semantic Traversability Estimation using Pose Projected Features
Sebastian {\AE}gidius, Dennis Hadjivelichkov, Jianhao Jiao, Jonathan, Embley-Riches, Dimitrios Kanoulas

TL;DR
This paper introduces a novel terrain traversability estimation method using dense feature embeddings from vision transformers, enabling legged robots to better navigate complex and hazardous environments by detecting anomalies.
Contribution
The work presents a new approach combining vision transformer features with an encoder-decoder model for terrain analysis, specifically tailored for legged robot navigation in unstructured terrains.
Findings
Effective anomaly detection in terrain using reconstruction error
Successful real-world tests on the ANYmal robot indoors and outdoors
Open-source code and video demonstrations available
Abstract
Understanding the traversability of terrain is essential for autonomous robot navigation, particularly in unstructured environments such as natural landscapes. Although traditional methods, such as occupancy mapping, provide a basic framework, they often fail to account for the complex mobility capabilities of some platforms such as legged robots. In this work, we propose a method for estimating terrain traversability by learning from demonstrations of human walking. Our approach leverages dense, pixel-wise feature embeddings generated using the DINOv2 vision Transformer model, which are processed through an encoder-decoder MLP architecture to analyze terrain segments. The averaged feature vectors, extracted from the masked regions of interest, are used to train the model in a reconstruction-based framework. By minimizing reconstruction loss, the network distinguishes between familiar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Softmax · Adam · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Vision Transformer · Multi-Head Attention
