Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance
Jiaming Zhang, Kailun Yang, Angela Constantinescu, Kunyu Peng, Karin, M\"uller, Rainer Stiefelhagen

TL;DR
This paper introduces Trans4Trans, a lightweight Transformer-based model for real-time segmentation of transparent and general objects, enhancing navigation safety for visually impaired users in diverse environments.
Contribution
The paper presents a novel dual-head Transformer model with a lightweight parsing module, achieving robust segmentation of transparent objects while maintaining efficiency on portable hardware.
Findings
Outperforms state-of-the-art on Stanford2D3D and Trans10K-v2 datasets.
Achieves high mIoU scores on Cityscapes, ACDC, and DADA-seg datasets.
Validated through user studies for real-world navigation assistance.
Abstract
Transparent objects, such as glass walls and doors, constitute architectural obstacles hindering the mobility of people with low vision or blindness. For instance, the open space behind glass doors is inaccessible, unless it is correctly perceived and interacted with. However, traditional assistive technologies rarely cover the segmentation of these safety-critical transparent objects. In this paper, we build a wearable system with a novel dual-head Transformer for Transparency (Trans4Trans) perception model, which can segment general- and transparent objects. The two dense segmentation results are further combined with depth information in the system to help users navigate safely and assist them to negotiate transparent obstacles. We propose a lightweight Transformer Parsing Module (TPM) to perform multi-scale feature interpretation in the transformer-based decoder. Benefiting from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Label Smoothing · Softmax · Byte Pair Encoding · Residual Connection
