Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification
Zefeng Ding, Changxing Ding, Zhiyin Shao, Dacheng Tao

TL;DR
This paper introduces SSAN, a novel network for text-to-image person re-identification that aligns semantic features across modalities, captures body part relationships, and reduces intra-class variance, achieving superior performance.
Contribution
The paper presents a semantically self-aligned network with part-level feature extraction, a multi-view non-local module, and a compound ranking loss, advancing text-to-image ReID methods.
Findings
SSAN outperforms existing methods on benchmark datasets.
The new ICFG-PEDES database facilitates future research.
The proposed components effectively reduce intra-class variance.
Abstract
Text-to-image person re-identification (ReID) aims to search for images containing a person of interest using textual descriptions. However, due to the significant modality gap and the large intra-class variance in textual descriptions, text-to-image ReID remains a challenging problem. Accordingly, in this paper, we propose a Semantically Self-Aligned Network (SSAN) to handle the above problems. First, we propose a novel method that automatically extracts semantically aligned part-level features from the two modalities. Second, we design a multi-view non-local network that captures the relationships between body parts, thereby establishing better correspondences between body parts and noun phrases. Third, we introduce a Compound Ranking (CR) loss that makes use of textual descriptions for other images of the same identity to provide extra supervision, thereby effectively reducing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Human Pose and Action Recognition
