Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large   Language Model Augmented Framework

Jiandong Jin; Xiao Wang; Qian Zhu; Haiyang Wang; Chenglong Li

arXiv:2408.09720·cs.CV·August 20, 2024·2 cites

Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework

Jiandong Jin, Xiao Wang, Qian Zhu, Haiyang Wang, Chenglong Li

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces MSP60K, a large-scale, cross-domain pedestrian attribute dataset, and proposes LLM-PAR, a novel framework augmented with large language models, to improve attribute recognition performance across diverse scenarios.

Contribution

The paper presents a new large-scale, cross-domain pedestrian attribute dataset and a LLM-augmented framework for enhanced recognition, addressing limitations of existing datasets and models.

Findings

01

MSP60K dataset covers 8 scenarios with 60,122 images and 57 attributes.

02

LLM-PAR framework improves attribute recognition accuracy across multiple benchmarks.

03

Synthetic degradation helps bridge the gap between dataset and real-world scenarios.

Abstract

Pedestrian Attribute Recognition (PAR) is one of the indispensable tasks in human-centered research. However, existing datasets neglect different domains (e.g., environments, times, populations, and data sources), only conducting simple random splits, and the performance of these datasets has already approached saturation. In the past five years, no large-scale dataset has been opened to the public. To address this issue, this paper proposes a new large-scale, cross-domain pedestrian attribute recognition dataset to fill the data gap, termed MSP60K. It consists of 60,122 images and 57 attribute annotations across eight scenarios. Synthetic degradation is also conducted to further narrow the gap between the dataset and real-world challenging scenarios. To establish a more rigorous benchmark, we evaluate 17 representative PAR models under both random and cross-domain split protocols on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework· underline

Taxonomy

TopicsAutomated Road and Building Extraction · Human Mobility and Location-Based Analysis · Traffic Prediction and Management Techniques

MethodsLinear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Attention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer