RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing
Jiahang Li, Yikang Zhang, Peng Yun, Guangliang Zhou, Qijun Chen, Rui, Fan

TL;DR
RoadFormer is a Transformer-based network that fuses RGB and surface normal data for improved semantic road scene parsing, especially for detecting hazards and defects, outperforming existing methods on multiple datasets.
Contribution
Introduces RoadFormer, a novel duplex Transformer architecture for multi-modal road scene parsing, and releases the SYN-UDTIRI dataset for comprehensive evaluation.
Findings
Outperforms state-of-the-art methods on multiple datasets
Ranks first on the KITTI road benchmark
Demonstrates effectiveness in detecting road defects
Abstract
The recent advancements in deep convolutional neural networks have shown significant promise in the domain of road scene parsing. Nevertheless, the existing works focus primarily on freespace detection, with little attention given to hazardous road defects that could compromise both driving safety and comfort. In this paper, we introduce RoadFormer, a novel Transformer-based data-fusion network developed for road scene parsing. RoadFormer utilizes a duplex encoder architecture to extract heterogeneous features from both RGB images and surface normal information. The encoded features are subsequently fed into a novel heterogeneous feature synergy block for effective feature fusion and recalibration. The pixel decoder then learns multi-scale long-range dependencies from the fused and recalibrated heterogeneous features, which are subsequently processed by a Transformer decoder to produce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Infrastructure Maintenance and Monitoring · Automated Road and Building Extraction
MethodsAttention Is All You Need · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection · Adam · Focus · Linear Layer · Multi-Head Attention
