Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy
Yunho Kim, Jeong Hyun Lee, Choongin Lee, Juhyeok Mun, Donghoon Youm,, Jeongsoo Park, Jemin Hwangbo

TL;DR
This paper introduces a scalable method for training semantic traversability estimators for autonomous robots using egocentric videos and automated annotation, reducing manual effort and improving generalization across urban environments.
Contribution
The authors propose a novel automated annotation strategy leveraging foundation models and egocentric videos, enabling scalable training of semantic traversability estimators without manual labeling.
Findings
High accuracy in diverse urban scenarios
Effective handling of various camera viewpoints
Demonstrated real-world deployment success
Abstract
For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves manual data collection with the target robot and annotation by human labelers which is prohibitively expensive and unscalable. In this work, we present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process. Egocentric videos are collected from a camera mounted on a pedestrian's chest. The dataset for training the semantic traversability estimator is then automatically generated by extracting semantically traversable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Natural Language Processing Techniques
