Trans4Trans: Efficient Transformer for Transparent Object and Semantic   Scene Segmentation in Real-World Navigation Assistance

Jiaming Zhang; Kailun Yang; Angela Constantinescu; Kunyu Peng; Karin; M\"uller; Rainer Stiefelhagen

arXiv:2108.09174·cs.CV·August 23, 2021

Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance

Jiaming Zhang, Kailun Yang, Angela Constantinescu, Kunyu Peng, Karin, M\"uller, Rainer Stiefelhagen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Trans4Trans, a lightweight Transformer-based model for real-time segmentation of transparent and general objects, enhancing navigation safety for visually impaired users in diverse environments.

Contribution

The paper presents a novel dual-head Transformer model with a lightweight parsing module, achieving robust segmentation of transparent objects while maintaining efficiency on portable hardware.

Findings

01

Outperforms state-of-the-art on Stanford2D3D and Trans10K-v2 datasets.

02

Achieves high mIoU scores on Cityscapes, ACDC, and DADA-seg datasets.

03

Validated through user studies for real-world navigation assistance.

Abstract

Transparent objects, such as glass walls and doors, constitute architectural obstacles hindering the mobility of people with low vision or blindness. For instance, the open space behind glass doors is inaccessible, unless it is correctly perceived and interacted with. However, traditional assistive technologies rarely cover the segmentation of these safety-critical transparent objects. In this paper, we build a wearable system with a novel dual-head Transformer for Transparency (Trans4Trans) perception model, which can segment general- and transparent objects. The two dense segmentation results are further combined with depth information in the system to help users navigate safely and assist them to negotiate transparent obstacles. We propose a lightweight Transformer Parsing Module (TPM) to perform multi-scale feature interpretation in the transformer-based decoder. Benefiting from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jamycheung/Trans4Trans
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Label Smoothing · Softmax · Byte Pair Encoding · Residual Connection