RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware   Information Decoupling and Advanced Heterogeneous Feature Fusion

Jianxin Huang; Jiahang Li; Ning Jia; Yuxiang Sun; Chengju Liu; Qijun; Chen; and Rui Fan

arXiv:2407.21631·cs.CV·August 23, 2024·1 cites

RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion

Jianxin Huang, Jiahang Li, Ning Jia, Yuxiang Sun, Chengju Liu, Qijun, Chen, and Rui Fan

PDF

Open Access

TL;DR

RoadFormer+ is a versatile scene parsing model that effectively fuses RGB-X data using scale-aware feature decoupling and advanced fusion techniques, achieving state-of-the-art results across multiple datasets.

Contribution

Introduces RoadFormer+ with a hybrid feature decoupling encoder and dual-branch multi-scale fusion, enhancing robustness and performance in universal scene parsing tasks.

Findings

01

Ranks first on KITTI Road benchmark

02

Achieves state-of-the-art mIoU on Cityscapes, MFNet, FMB, ZJU datasets

03

Reduces parameters by 65% compared to RoadFormer

Abstract

Task-specific data-fusion networks have marked considerable achievements in urban scene parsing. Among these networks, our recently proposed RoadFormer successfully extracts heterogeneous features from RGB images and surface normal maps and fuses these features through attention mechanisms, demonstrating compelling efficacy in RGB-Normal road scene parsing. However, its performance significantly deteriorates when handling other types/sources of data or performing more universal, all-category scene parsing tasks. To overcome these limitations, this study introduces RoadFormer+, an efficient, robust, and adaptable model capable of effectively fusing RGB-X data, where ``X'', represents additional types/modalities of data such as depth, thermal, surface normal, and polarization. Specifically, we propose a novel hybrid feature decoupling encoder to extract heterogeneous features and decouple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Medical Image Segmentation Techniques · Industrial Vision Systems and Defect Detection

MethodsLinear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections