CLEAR: Cross-Transformers with Pre-trained Language Model is All you   need for Person Attribute Recognition and Retrieval

Doanh C. Bui; Thinh V. Le; Ba Hung Ngo; Tae Jong Choi

arXiv:2403.06119·cs.CV·May 1, 2024·3 cites

CLEAR: Cross-Transformers with Pre-trained Language Model is All you need for Person Attribute Recognition and Retrieval

Doanh C. Bui, Thinh V. Le, Ba Hung Ngo, Tae Jong Choi

PDF

Open Access

TL;DR

CLEAR is a unified model leveraging cross-transformers and pre-trained language models to improve person attribute recognition and retrieval, addressing modality gaps and achieving state-of-the-art results.

Contribution

The paper introduces a novel unified network, CLEAR, combining cross-transformers and language models to enhance both recognition and retrieval of person attributes.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Significantly improves person retrieval on Market-1501.

03

Effectively handles modality gap with pseudo-descriptions.

Abstract

Person attribute recognition and attribute-based retrieval are two core human-centric tasks. In the recognition task, the challenge is specifying attributes depending on a person's appearance, while the retrieval task involves searching for matching persons based on attribute queries. There is a significant relationship between recognition and retrieval tasks. In this study, we demonstrate that if there is a sufficiently robust network to solve person attribute recognition, it can be adapted to facilitate better performance for the retrieval task. Another issue that needs addressing in the retrieval task is the modality gap between attribute queries and persons' images. Therefore, in this paper, we present CLEAR, a unified network designed to address both tasks. We introduce a robust cross-transformers network to handle person attribute recognition. Additionally, leveraging a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications