RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion
Jianxin Huang, Jiahang Li, Ning Jia, Yuxiang Sun, Chengju Liu, Qijun, Chen, and Rui Fan

TL;DR
RoadFormer+ is a versatile scene parsing model that effectively fuses RGB-X data using scale-aware feature decoupling and advanced fusion techniques, achieving state-of-the-art results across multiple datasets.
Contribution
Introduces RoadFormer+ with a hybrid feature decoupling encoder and dual-branch multi-scale fusion, enhancing robustness and performance in universal scene parsing tasks.
Findings
Ranks first on KITTI Road benchmark
Achieves state-of-the-art mIoU on Cityscapes, MFNet, FMB, ZJU datasets
Reduces parameters by 65% compared to RoadFormer
Abstract
Task-specific data-fusion networks have marked considerable achievements in urban scene parsing. Among these networks, our recently proposed RoadFormer successfully extracts heterogeneous features from RGB images and surface normal maps and fuses these features through attention mechanisms, demonstrating compelling efficacy in RGB-Normal road scene parsing. However, its performance significantly deteriorates when handling other types/sources of data or performing more universal, all-category scene parsing tasks. To overcome these limitations, this study introduces RoadFormer+, an efficient, robust, and adaptable model capable of effectively fusing RGB-X data, where ``X'', represents additional types/modalities of data such as depth, thermal, surface normal, and polarization. Specifically, we propose a novel hybrid feature decoupling encoder to extract heterogeneous features and decouple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Medical Image Segmentation Techniques · Industrial Vision Systems and Defect Detection
MethodsLinear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections
