Trans4Trans: Efficient Transformer for Transparent Object Segmentation to Help Visually Impaired People Navigate in the Real World
Jiaming Zhang, Kailun Yang, Angela Constantinescu, Kunyu Peng, Karin, M\"uller, Rainer Stiefelhagen

TL;DR
This paper introduces Trans4Trans, an efficient Transformer-based model designed for real-time segmentation of transparent objects to aid visually impaired individuals in navigation, demonstrating superior performance and practical usability.
Contribution
The paper presents a novel dual-head Transformer model with a Transformer Parsing Module for joint learning, optimized for transparent object segmentation in assistive navigation systems.
Findings
Outperforms state-of-the-art on Stanford2D3D and Trans10K-v2 datasets
Achieves high mIoU scores of 45.13% and 75.14%
Validated through user studies in real-world scenarios
Abstract
Common fully glazed facades and transparent objects present architectural barriers and impede the mobility of people with low vision or blindness, for instance, a path detected behind a glass door is inaccessible unless it is correctly perceived and reacted. However, segmenting these safety-critical objects is rarely covered by conventional assistive technologies. To tackle this issue, we construct a wearable system with a novel dual-head Transformer for Transparency (Trans4Trans) model, which is capable of segmenting general and transparent objects and performing real-time wayfinding to assist people walking alone more safely. Especially, both decoders created by our proposed Transformer Parsing Module (TPM) enable effective joint learning from different datasets. Besides, the efficient Trans4Trans model composed of symmetric transformer-based encoder and decoder, requires little…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Tactile and Sensory Interactions · Video Surveillance and Tracking Methods
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax · Dense Connections · Adam
