Perception Framework through Real-Time Semantic Segmentation and Scene   Recognition on a Wearable System for the Visually Impaired

Yingzhi Zhang; Haoye Chen; Kailun Yang; Jiaming Zhang; Rainer; Stiefelhagen

arXiv:2103.04136·cs.CV·March 9, 2021·1 cites

Perception Framework through Real-Time Semantic Segmentation and Scene Recognition on a Wearable System for the Visually Impaired

Yingzhi Zhang, Haoye Chen, Kailun Yang, Jiaming Zhang, Rainer, Stiefelhagen

PDF

Open Access

TL;DR

This paper introduces a multi-task perception system for visually impaired individuals that combines real-time semantic segmentation and scene recognition on a wearable device, enhancing navigation assistance.

Contribution

A novel multi-task neural network architecture with shared parameters and attention mechanisms for efficient scene parsing and recognition on wearable hardware.

Findings

01

High accuracy on public datasets and real-world scenes

02

Real-time performance on wearable hardware

03

Effective integration of semantic and scene information

Abstract

As the scene information, including objectness and scene type, are important for people with visual impairment, in this work we present a multi-task efficient perception system for the scene parsing and recognition tasks. Building on the compact ResNet backbone, our designed network architecture has two paths with shared parameters. In the structure, the semantic segmentation path integrates fast attention, with the aim of harvesting long-range contextual information in an efficient manner. Simultaneously, the scene recognition path attains the scene type inference by passing the semantic features into semantic-driven attention networks and combining the semantic extracted representations with the RGB extracted representations through a gated attention module. In the experiments, we have verified the systems' accuracy and efficiency on both public datasets and real-world scenes. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Retinal Imaging and Analysis · Video Surveillance and Tracking Methods

MethodsResidual Connection · Max Pooling · Average Pooling · Residual Block · Kaiming Initialization · Global Average Pooling · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Bottleneck Residual Block