Text-based Aerial-Ground Person Retrieval

Xinyu Zhou; Yu Wu; Jiayao Ma; Wenhao Wang; Min Cao; Mang Ye

arXiv:2511.08369·cs.CV·November 12, 2025

Text-based Aerial-Ground Person Retrieval

Xinyu Zhou, Yu Wu, Jiayao Ma, Wenhao Wang, Min Cao, Mang Ye

PDF

Open Access 1 Video

TL;DR

This paper introduces a new task of retrieving person images from aerial and ground views using text descriptions, supported by a new dataset and a novel retrieval framework that handles large viewpoint differences.

Contribution

It presents the TAG-PEDES dataset with diversified textual descriptions and the TAG-CLIP framework that effectively manages view heterogeneity through specialized modules.

Findings

01

TAG-CLIP outperforms existing methods on TAG-PEDES and T-PR benchmarks.

02

The dataset enables robust training for cross-view text-based person retrieval.

03

Viewpoint decoupling improves cross-modal alignment in heterogeneous views.

Abstract

This work introduces Text-based Aerial-Ground Person Retrieval (TAG-PR), which aims to retrieve person images from heterogeneous aerial and ground views with textual descriptions. Unlike traditional Text-based Person Retrieval (T-PR), which focuses solely on ground-view images, TAG-PR introduces greater practical significance and presents unique challenges due to the large viewpoint discrepancy across images. To support this task, we contribute: (1) TAG-PEDES dataset, constructed from public benchmarks with automatically generated textual descriptions, enhanced by a diversified text generation paradigm to ensure robustness under view heterogeneity; and (2) TAG-CLIP, a novel retrieval framework that addresses view heterogeneity through a hierarchically-routed mixture of experts module to learn view-specific and view-agnostic features and a viewpoint decoupling strategy to decouple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Text-based Aerial-Ground Person Retrieval· underline

Taxonomy

TopicsVideo Surveillance and Tracking Methods · UAV Applications and Optimization · Advanced Neural Network Applications