GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification
Qiao Li, Jie Li, Yukang Zhang, Lei Tan, Jing Chen, Jiayi Ji

TL;DR
GSAlign is a novel network that improves aerial-ground person re-identification by jointly addressing geometric distortions and semantic misalignments using learnable warping and visibility-aware masking, significantly outperforming previous methods.
Contribution
The paper introduces GSAlign, a new framework with LTPS and DAM modules that effectively handle extreme viewpoint variations and occlusions in aerial-ground person re-ID.
Findings
Achieves +18.8% mAP improvement over state-of-the-art.
Achieves +16.8% Rank-1 accuracy improvement.
Demonstrates effectiveness on CARGO dataset with four protocols.
Abstract
Aerial-Ground person re-identification (AG-ReID) is an emerging yet challenging task that aims to match pedestrian images captured from drastically different viewpoints, typically from unmanned aerial vehicles (UAVs) and ground-based surveillance cameras. The task poses significant challenges due to extreme viewpoint discrepancies, occlusions, and domain gaps between aerial and ground imagery. While prior works have made progress by learning cross-view representations, they remain limited in handling severe pose variations and spatial misalignment. To address these issues, we propose a Geometric and Semantic Alignment Network (GSAlign) tailored for AG-ReID. GSAlign introduces two key components to jointly tackle geometric distortion and semantic misalignment in aerial-ground matching: a Learnable Thin Plate Spline (LTPS) Module and a Dynamic Alignment Module (DAM). The LTPS module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
