RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene   Parsing

Jiahang Li; Yikang Zhang; Peng Yun; Guangliang Zhou; Qijun Chen; Rui; Fan

arXiv:2309.10356·cs.CV·July 2, 2024·1 cites

RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing

Jiahang Li, Yikang Zhang, Peng Yun, Guangliang Zhou, Qijun Chen, Rui, Fan

PDF

Open Access

TL;DR

RoadFormer is a Transformer-based network that fuses RGB and surface normal data for improved semantic road scene parsing, especially for detecting hazards and defects, outperforming existing methods on multiple datasets.

Contribution

Introduces RoadFormer, a novel duplex Transformer architecture for multi-modal road scene parsing, and releases the SYN-UDTIRI dataset for comprehensive evaluation.

Findings

01

Outperforms state-of-the-art methods on multiple datasets

02

Ranks first on the KITTI road benchmark

03

Demonstrates effectiveness in detecting road defects

Abstract

The recent advancements in deep convolutional neural networks have shown significant promise in the domain of road scene parsing. Nevertheless, the existing works focus primarily on freespace detection, with little attention given to hazardous road defects that could compromise both driving safety and comfort. In this paper, we introduce RoadFormer, a novel Transformer-based data-fusion network developed for road scene parsing. RoadFormer utilizes a duplex encoder architecture to extract heterogeneous features from both RGB images and surface normal information. The encoded features are subsequently fed into a novel heterogeneous feature synergy block for effective feature fusion and recalibration. The pixel decoder then learns multi-scale long-range dependencies from the fused and recalibrated heterogeneous features, which are subsequently processed by a Transformer decoder to produce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Infrastructure Maintenance and Monitoring · Automated Road and Building Extraction

MethodsAttention Is All You Need · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection · Adam · Focus · Linear Layer · Multi-Head Attention