Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework
Jiandong Jin, Xiao Wang, Qian Zhu, Haiyang Wang, Chenglong Li

TL;DR
This paper introduces MSP60K, a large-scale, cross-domain pedestrian attribute dataset, and proposes LLM-PAR, a novel framework augmented with large language models, to improve attribute recognition performance across diverse scenarios.
Contribution
The paper presents a new large-scale, cross-domain pedestrian attribute dataset and a LLM-augmented framework for enhanced recognition, addressing limitations of existing datasets and models.
Findings
MSP60K dataset covers 8 scenarios with 60,122 images and 57 attributes.
LLM-PAR framework improves attribute recognition accuracy across multiple benchmarks.
Synthetic degradation helps bridge the gap between dataset and real-world scenarios.
Abstract
Pedestrian Attribute Recognition (PAR) is one of the indispensable tasks in human-centered research. However, existing datasets neglect different domains (e.g., environments, times, populations, and data sources), only conducting simple random splits, and the performance of these datasets has already approached saturation. In the past five years, no large-scale dataset has been opened to the public. To address this issue, this paper proposes a new large-scale, cross-domain pedestrian attribute recognition dataset to fill the data gap, termed MSP60K. It consists of 60,122 images and 57 attribute annotations across eight scenarios. Synthetic degradation is also conducted to further narrow the gap between the dataset and real-world challenging scenarios. To establish a more rigorous benchmark, we evaluate 17 representative PAR models under both random and cross-domain split protocols on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAutomated Road and Building Extraction · Human Mobility and Location-Based Analysis · Traffic Prediction and Management Techniques
MethodsLinear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Attention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer
