Perception Framework through Real-Time Semantic Segmentation and Scene Recognition on a Wearable System for the Visually Impaired
Yingzhi Zhang, Haoye Chen, Kailun Yang, Jiaming Zhang, Rainer, Stiefelhagen

TL;DR
This paper introduces a multi-task perception system for visually impaired individuals that combines real-time semantic segmentation and scene recognition on a wearable device, enhancing navigation assistance.
Contribution
A novel multi-task neural network architecture with shared parameters and attention mechanisms for efficient scene parsing and recognition on wearable hardware.
Findings
High accuracy on public datasets and real-world scenes
Real-time performance on wearable hardware
Effective integration of semantic and scene information
Abstract
As the scene information, including objectness and scene type, are important for people with visual impairment, in this work we present a multi-task efficient perception system for the scene parsing and recognition tasks. Building on the compact ResNet backbone, our designed network architecture has two paths with shared parameters. In the structure, the semantic segmentation path integrates fast attention, with the aim of harvesting long-range contextual information in an efficient manner. Simultaneously, the scene recognition path attains the scene type inference by passing the semantic features into semantic-driven attention networks and combining the semantic extracted representations with the RGB extracted representations through a gated attention module. In the experiments, we have verified the systems' accuracy and efficiency on both public datasets and real-world scenes. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Retinal Imaging and Analysis · Video Surveillance and Tracking Methods
MethodsResidual Connection · Max Pooling · Average Pooling · Residual Block · Kaiming Initialization · Global Average Pooling · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Bottleneck Residual Block
