Efficient Fourier Filtering Network with Contrastive Learning for AAV-based Unaligned Bimodal Salient Object Detection

Pengfei Lyu; Pak-Hei Yeung; Xiaosheng Yu; Xiufei Cheng; Chengdong Wu; Jagath C. Rajapakse

arXiv:2411.03728·cs.CV·November 19, 2025

Efficient Fourier Filtering Network with Contrastive Learning for AAV-based Unaligned Bimodal Salient Object Detection

Pengfei Lyu, Pak-Hei Yeung, Xiaosheng Yu, Xiufei Cheng, Chengdong Wu, Jagath C. Rajapakse

PDF

Open Access 1 Repo

TL;DR

This paper introduces AlignSal, an efficient Fourier filtering network with contrastive learning for unaligned bi-modal salient object detection in aerial imagery, achieving real-time performance with significantly reduced computational costs.

Contribution

The paper proposes a novel Fourier filter network with contrastive learning that enhances unaligned bi-modal salient object detection efficiency and accuracy, reducing parameters and computation while improving speed.

Findings

01

Achieves real-time inference speed on AAV datasets.

02

Reduces model parameters by 70% and FLOPs by 49.4%.

03

Outperforms 19 state-of-the-art models across multiple metrics.

Abstract

Autonomous aerial vehicle (AAV)-based bi-modal salient object detection (BSOD) aims to segment salient objects in a scene utilizing complementary cues in unaligned RGB and thermal image pairs. However, the high computational expense of existing AAV-based BSOD models limits their applicability to real-world AAV devices. To address this problem, we propose an efficient Fourier filter network with contrastive learning that achieves both real-time and accurate performance. Specifically, we first design a semantic contrastive alignment loss to align the two modalities at the semantic level, which facilitates mutual refinement in a parameter-free way. Second, inspired by the fast Fourier transform that obtains global relevance in linear complexity, we propose synchronized alignment fusion, which aligns and fuses bi-modal features in the channel and spatial dimensions by a hierarchical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joshualpf/alignsal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques · Infrared Target Detection Methodologies · Visual Attention and Saliency Detection

MethodsALIGN · Contrastive Learning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings