Trans4Trans: Efficient Transformer for Transparent Object Segmentation   to Help Visually Impaired People Navigate in the Real World

Jiaming Zhang; Kailun Yang; Angela Constantinescu; Kunyu Peng; Karin; M\"uller; Rainer Stiefelhagen

arXiv:2107.03172·cs.CV·August 23, 2021·1 cites

Trans4Trans: Efficient Transformer for Transparent Object Segmentation to Help Visually Impaired People Navigate in the Real World

Jiaming Zhang, Kailun Yang, Angela Constantinescu, Kunyu Peng, Karin, M\"uller, Rainer Stiefelhagen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Trans4Trans, an efficient Transformer-based model designed for real-time segmentation of transparent objects to aid visually impaired individuals in navigation, demonstrating superior performance and practical usability.

Contribution

The paper presents a novel dual-head Transformer model with a Transformer Parsing Module for joint learning, optimized for transparent object segmentation in assistive navigation systems.

Findings

01

Outperforms state-of-the-art on Stanford2D3D and Trans10K-v2 datasets

02

Achieves high mIoU scores of 45.13% and 75.14%

03

Validated through user studies in real-world scenarios

Abstract

Common fully glazed facades and transparent objects present architectural barriers and impede the mobility of people with low vision or blindness, for instance, a path detected behind a glass door is inaccessible unless it is correctly perceived and reacted. However, segmenting these safety-critical objects is rarely covered by conventional assistive technologies. To tackle this issue, we construct a wearable system with a novel dual-head Transformer for Transparency (Trans4Trans) model, which is capable of segmenting general and transparent objects and performing real-time wayfinding to assist people walking alone more safely. Especially, both decoders created by our proposed Transformer Parsing Module (TPM) enable effective joint learning from different datasets. Besides, the efficient Trans4Trans model composed of symmetric transformer-based encoder and decoder, requires little…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jamycheung/Trans4Trans
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Tactile and Sensory Interactions · Video Surveillance and Tracking Methods

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax · Dense Connections · Adam