SANPO: A Scene Understanding, Accessibility and Human Navigation Dataset

Sagar M. Waghmare; Kimberly Wilber; Dave Hawkey; Xuan Yang; Matthew; Wilson; Stephanie Debats; Cattalyya Nuengsigkapian; Astuti Sharma; Lars; Pandikow; Huisheng Wang; Hartwig Adam; Mikhail Sirotenko

arXiv:2309.12172·cs.CV·December 23, 2024

SANPO: A Scene Understanding, Accessibility and Human Navigation Dataset

Sagar M. Waghmare, Kimberly Wilber, Dave Hawkey, Xuan Yang, Matthew, Wilson, Stephanie Debats, Cattalyya Nuengsigkapian, Astuti Sharma, Lars, Pandikow, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko

PDF

Open Access

TL;DR

SANPO is a comprehensive outdoor egocentric video dataset with dense annotations, designed to advance assistive navigation technologies for visually impaired individuals by providing real and synthetic data for training and evaluation.

Contribution

We introduce SANPO, a large-scale, annotated egocentric video dataset for outdoor human navigation, filling a critical gap in datasets for assistive vision technologies.

Findings

01

SANPO contains 701 real-world stereo videos with dense panoptic segmentation.

02

The dataset includes 1961 synthetic videos with high-quality annotations.

03

SANPO is already aiding mobile models for assistive navigation applications.

Abstract

Vision is essential for human navigation. The World Health Organization (WHO) estimates that 43.3 million people were blind in 2020, and this number is projected to reach 61 million by 2050. Modern scene understanding models could empower these people by assisting them with navigation, obstacle avoidance and visual recognition capabilities. The research community needs high quality datasets for both training and evaluation to build these systems. While datasets for autonomous vehicles are abundant, there is a critical gap in datasets tailored for outdoor human navigation. This gap poses a major obstacle to the development of computer vision based Assistive Technologies. To overcome this obstacle, we present SANPO, a large-scale egocentric video dataset designed for dense prediction in outdoor human navigation environments. SANPO contains 701 stereo videos of 30+ seconds captured in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Human Pose and Action Recognition