An Empirical Study of Mamba-based Pedestrian Attribute Recognition

Xiao Wang; Weizhe Kong; Jiandong Jin; Shiao Wang; Ruichong Gao,; Qingchuan Ma; Chenglong Li; Jin Tang

arXiv:2407.10374·cs.CV·December 4, 2024

An Empirical Study of Mamba-based Pedestrian Attribute Recognition

Xiao Wang, Weizhe Kong, Jiandong Jin, Shiao Wang, Ruichong Gao,, Qingchuan Ma, Chenglong Li, Jin Tang

PDF

Open Access 1 Repo

TL;DR

This study evaluates the effectiveness of Mamba-based architectures for pedestrian attribute recognition, comparing them with Transformer models and exploring hybrid designs to optimize accuracy and computational efficiency.

Contribution

It adapts Mamba into pedestrian attribute recognition frameworks, investigates hybrid Mamba-Transformer models, and provides comprehensive experimental validation of their performance.

Findings

01

Interacting with attribute tags does not always improve performance.

02

Hybrid Mamba-Transformer models can outperform pure Mamba models under certain conditions.

03

Simply adding Transformer components to Mamba does not guarantee better results.

Abstract

Current strong pedestrian attribute recognition models are developed based on Transformer networks, which are computationally heavy. Recently proposed models with linear complexity (e.g., Mamba) have garnered significant attention and have achieved a good balance between accuracy and computational cost across a variety of visual tasks. Relevant review articles also suggest that while these models can perform well on some pedestrian attribute recognition datasets, they are generally weaker than the corresponding Transformer models. To further tap into the potential of the novel Mamba architecture for PAR tasks, this paper designs and adapts Mamba into two typical PAR frameworks, i.e., the text-image fusion approach and pure vision Mamba multi-label recognition framework. It is found that interacting with attribute tags as additional input does not always lead to an improvement,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

event-ahu/openpar
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Adam · Dropout · Multi-Head Attention · Dense Connections · Softmax