Efficient Fourier Filtering Network with Contrastive Learning for AAV-based Unaligned Bimodal Salient Object Detection
Pengfei Lyu, Pak-Hei Yeung, Xiaosheng Yu, Xiufei Cheng, Chengdong Wu, Jagath C. Rajapakse

TL;DR
This paper introduces AlignSal, an efficient Fourier filtering network with contrastive learning for unaligned bi-modal salient object detection in aerial imagery, achieving real-time performance with significantly reduced computational costs.
Contribution
The paper proposes a novel Fourier filter network with contrastive learning that enhances unaligned bi-modal salient object detection efficiency and accuracy, reducing parameters and computation while improving speed.
Findings
Achieves real-time inference speed on AAV datasets.
Reduces model parameters by 70% and FLOPs by 49.4%.
Outperforms 19 state-of-the-art models across multiple metrics.
Abstract
Autonomous aerial vehicle (AAV)-based bi-modal salient object detection (BSOD) aims to segment salient objects in a scene utilizing complementary cues in unaligned RGB and thermal image pairs. However, the high computational expense of existing AAV-based BSOD models limits their applicability to real-world AAV devices. To address this problem, we propose an efficient Fourier filter network with contrastive learning that achieves both real-time and accurate performance. Specifically, we first design a semantic contrastive alignment loss to align the two modalities at the semantic level, which facilitates mutual refinement in a parameter-free way. Second, inspired by the fast Fourier transform that obtains global relevance in linear complexity, we propose synchronized alignment fusion, which aligns and fuses bi-modal features in the channel and spatial dimensions by a hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques · Infrared Target Detection Methodologies · Visual Attention and Saliency Detection
MethodsALIGN · Contrastive Learning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
