Cross-modal Fuzzy Alignment Network for Text-Aerial Person Retrieval and A Large-scale Benchmark

Yifei Deng; Chenglong Li; Yuyang Zhang; Guyue Hu; Jin Tang

arXiv:2603.20721·cs.CV·March 24, 2026

Cross-modal Fuzzy Alignment Network for Text-Aerial Person Retrieval and A Large-scale Benchmark

Yifei Deng, Chenglong Li, Yuyang Zhang, Guyue Hu, Jin Tang

PDF

Open Access

TL;DR

This paper introduces a novel cross-modal fuzzy alignment network for text-aerial person retrieval, addressing challenges of visual variation and semantic alignment with a fuzzy logic-based approach and a ground-view bridging strategy, supported by a large-scale benchmark dataset.

Contribution

The paper proposes a fuzzy token alignment module and a context-aware dynamic alignment module, enhancing robustness and accuracy in aerial image-text retrieval tasks, along with a new large-scale benchmark dataset.

Findings

01

Outperforms existing methods on AERI-PEDES and TBAPR datasets.

02

Enhances semantic alignment robustness with fuzzy logic and ground-view bridging.

03

Achieves significant improvements in retrieval accuracy and semantic consistency.

Abstract

Text-aerial person retrieval aims to identify targets in UAV-captured images from eyewitness descriptions, supporting intelligent transportation and public security applications. Compared to ground-view text--image person retrieval, UAV-captured images often suffer from degraded visual information due to drastic variations in viewing angles and flight altitudes, making semantic alignment with textual descriptions very challenging. To address this issue, we propose a novel Cross-modal Fuzzy Alignment Network, which quantifies the token-level reliability by fuzzy logic to achieve accurate fine-grained alignment and incorporates ground-view images as a bridge agent to further mitigate the gap between aerial images and text descriptions, for text--aerial person retrieval. In particular, we design the Fuzzy Token Alignment module that employs the fuzzy membership function to dynamically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Video Surveillance and Tracking Methods