ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
Boseung Jeong, Jicheol Park, Suha Kwak

TL;DR
This paper introduces an adaptive semantic margin regularizer for cross-modal embedding in attribute-based person search, effectively reducing the modality gap and achieving state-of-the-art results.
Contribution
It proposes a novel loss function with an adaptive margin learned end-to-end, improving modality alignment and discriminability in person search.
Findings
Achieves state-of-the-art performance on public benchmarks.
Effectively reduces the modality gap between images and attributes.
Enhances discriminative power of embeddings through semantic-aware margins.
Abstract
Attribute-based person search is the task of finding person images that are best matched with a set of text attributes given as query. The main challenge of this task is the large modality gap between attributes and images. To reduce the gap, we present a new loss for learning cross-modal embeddings in the context of attribute-based person search. We regard a set of attributes as a category of people sharing the same traits. In a joint embedding space of the two modalities, our loss pulls images close to their person categories for modality alignment. More importantly, it pushes apart a pair of person categories by a margin determined adaptively by their semantic distance, where the distance metric is learned end-to-end so that the loss considers importance of each attribute when relating person categories. Our loss guided by the adaptive semantic margin leads to more discriminative and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Gait Recognition and Analysis
