VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search

Shuting He; Hao Luo; Wei Jiang; Xudong Jiang; Henghui Ding

arXiv:2311.07514·cs.CV·November 14, 2023·1 cites

VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search

Shuting He, Hao Luo, Wei Jiang, Xudong Jiang, Henghui Ding

PDF

Open Access

TL;DR

This paper introduces VGSG, a novel network for text-based person search that effectively aligns visual and textual features using semantic grouping and vision-guided knowledge transfer, avoiding external tools and complex interactions.

Contribution

The paper proposes a vision-guided semantic-group network with modules for implicit semantic grouping and knowledge transfer, improving cross-modal alignment in person search tasks.

Findings

01

VGSG outperforms state-of-the-art methods on benchmark datasets.

02

The semantic-group textual learning improves local feature extraction.

03

Vision-guided knowledge transfer enhances feature alignment without external tools.

Abstract

Text-based Person Search (TBPS) aims to retrieve images of target pedestrian indicated by textual descriptions. It is essential for TBPS to extract fine-grained local features and align them crossing modality. Existing methods utilize external tools or heavy cross-modal interaction to achieve explicit alignment of cross-modal fine-grained features, which is inefficient and time-consuming. In this work, we propose a Vision-Guided Semantic-Group Network (VGSG) for text-based person search to extract well-aligned fine-grained visual and textual features. In the proposed VGSG, we develop a Semantic-Group Textual Learning (SGTL) module and a Vision-guided Knowledge Transfer (VGKT) module to extract textual local features under the guidance of visual local clues. In SGTL, in order to obtain the local textual representation, we group textual features from the channel dimension based on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsALIGN