Cross-modal Fuzzy Alignment Network for Text-Aerial Person Retrieval and A Large-scale Benchmark
Yifei Deng, Chenglong Li, Yuyang Zhang, Guyue Hu, Jin Tang

TL;DR
This paper introduces a novel cross-modal fuzzy alignment network for text-aerial person retrieval, addressing challenges of visual variation and semantic alignment with a fuzzy logic-based approach and a ground-view bridging strategy, supported by a large-scale benchmark dataset.
Contribution
The paper proposes a fuzzy token alignment module and a context-aware dynamic alignment module, enhancing robustness and accuracy in aerial image-text retrieval tasks, along with a new large-scale benchmark dataset.
Findings
Outperforms existing methods on AERI-PEDES and TBAPR datasets.
Enhances semantic alignment robustness with fuzzy logic and ground-view bridging.
Achieves significant improvements in retrieval accuracy and semantic consistency.
Abstract
Text-aerial person retrieval aims to identify targets in UAV-captured images from eyewitness descriptions, supporting intelligent transportation and public security applications. Compared to ground-view text--image person retrieval, UAV-captured images often suffer from degraded visual information due to drastic variations in viewing angles and flight altitudes, making semantic alignment with textual descriptions very challenging. To address this issue, we propose a novel Cross-modal Fuzzy Alignment Network, which quantifies the token-level reliability by fuzzy logic to achieve accurate fine-grained alignment and incorporates ground-view images as a bridge agent to further mitigate the gap between aerial images and text descriptions, for text--aerial person retrieval. In particular, we design the Fuzzy Token Alignment module that employs the fuzzy membership function to dynamically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Video Surveillance and Tracking Methods
