HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion
Jiahang Li, Peng Yun, Yang Xu, Ye Zhang, Mingjian Sun, Qijun Chen, Ilin Alexander, Rui Fan

TL;DR
HAPNet introduces a hybrid, asymmetric encoder leveraging vision foundation models and convolutional networks for improved RGB-thermal scene parsing, achieving state-of-the-art results by effectively exploiting modality differences.
Contribution
The paper proposes a novel hybrid, asymmetric encoder with VFM and CNN components, and a dual-path progressive fusion strategy, to better utilize heterogeneous features in RGB-thermal scene parsing.
Findings
Achieves superior performance on three public datasets.
Outperforms all existing RGB-thermal scene parsing networks.
Demonstrates the effectiveness of asymmetric, hybrid feature fusion.
Abstract
Data-fusion networks have shown significant promise for RGB-thermal scene parsing. However, the majority of existing studies have relied on symmetric duplex encoders for heterogeneous feature extraction and fusion, paying inadequate attention to the inherent differences between RGB and thermal modalities. Recent progress in vision foundation models (VFMs) trained through self-supervision on vast amounts of unlabeled data has proven their ability to extract informative, general-purpose features. However, this potential has yet to be fully leveraged in the domain. In this study, we take one step toward this new research area by exploring a feasible strategy to fully exploit VFM features for RGB-thermal scene parsing. Specifically, we delve deeper into the unique characteristics of RGB and thermal modalities, thereby designing a hybrid, asymmetric encoder that incorporates both a VFM and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Generative Adversarial Networks and Image Synthesis
