TL;DR
This paper introduces ViSA, a view-aware semantic alignment framework for aerial-ground person re-identification, effectively handling viewpoint variations with innovative modules and achieving superior benchmark performance.
Contribution
Proposes ViSA, a novel view-aware framework with expert-driven token generation and local fusion modules for improved cross-view person re-identification.
Findings
Achieves a 10.06% mAP improvement on CARGO benchmark.
Demonstrates superior performance across three AGPReID datasets.
Effectively models viewpoint-specific cues with new modules.
Abstract
Aerial-Ground Person Re-Identification (AGPReID) remains highly challenging due to drastic viewpoint variations between drones and fixed cameras. Existing methods typically follow a view-invariant paradigm, aligning shared features across views to achieve robustness. However, view-invariant inherently enforces part-level alignment, which ignores view-specific cues and discriminative identity information. To this end, this work proposes ViSA (View-aware Semantic Alignment), a view-aware framework that achieves cross-view semantic consistency containing an Expert-driven Token Generation Module (ETGM) and a Dual-branch Local Fusion Module (DLFM). Technically, the former constructs a set of view-aware experts to generate adaptive semantic queries that perceive viewpoint-specific patterns, while the latter leverages graph reasoning to extract and align local regions responsive to different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
