An Empirical Study of Mamba-based Pedestrian Attribute Recognition
Xiao Wang, Weizhe Kong, Jiandong Jin, Shiao Wang, Ruichong Gao,, Qingchuan Ma, Chenglong Li, Jin Tang

TL;DR
This study evaluates the effectiveness of Mamba-based architectures for pedestrian attribute recognition, comparing them with Transformer models and exploring hybrid designs to optimize accuracy and computational efficiency.
Contribution
It adapts Mamba into pedestrian attribute recognition frameworks, investigates hybrid Mamba-Transformer models, and provides comprehensive experimental validation of their performance.
Findings
Interacting with attribute tags does not always improve performance.
Hybrid Mamba-Transformer models can outperform pure Mamba models under certain conditions.
Simply adding Transformer components to Mamba does not guarantee better results.
Abstract
Current strong pedestrian attribute recognition models are developed based on Transformer networks, which are computationally heavy. Recently proposed models with linear complexity (e.g., Mamba) have garnered significant attention and have achieved a good balance between accuracy and computational cost across a variety of visual tasks. Relevant review articles also suggest that while these models can perform well on some pedestrian attribute recognition datasets, they are generally weaker than the corresponding Transformer models. To further tap into the potential of the novel Mamba architecture for PAR tasks, this paper designs and adapts Mamba into two typical PAR frameworks, i.e., the text-image fusion approach and pure vision Mamba multi-label recognition framework. It is found that interacting with attribute tags as additional input does not always lead to an improvement,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Adam · Dropout · Multi-Head Attention · Dense Connections · Softmax
